On May 15, 2010, at 3:38 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 15.05.2010 11:28: >> Stefan Behnel wrote: >>> latest cython-devel can infer the type of a for-loop variable when >>> iterating over C arrays, C pointers and Python strings. It will >>> infer >>> Py_UNICODE for unicode strings, but plain 'object' for a bytes >>> string, as >>> this returns sliced strings in Py2 and integers in Py3, so there >>> is no >>> common C type. So the following will infer c to be a plain Python >>> object: >>> >>> cdef bytes s = b'abcdefg' >>> >>> c = s[4] >>> for c in s: >>> pass >>> >>> However, this: >>> >>> c = b'abcdefg'[4] >>> for c in b'abcdefg': >>> pass >>> >>> will infer 'char' for c, as the bytes literal starts off as a >>> char* string. >>> The main problem here is that 'char' does not behave like a Python >>> bytes >>> object at all. I doubt that iterating over bytes literals is a >>> common use >>> case, but I'm not sure about the 'least surprising' thing to do >>> here. >>> >>> Should we special case this to prevent breaking Python-2 >>> semantics, or >>> should we expect that users will usually want 'char' as a result >>> anyway? >>> >>> Both behaviours are easy to get with a simple cast, so this is >>> really only >>> a matter of consistency and least surprise. The thing that really >>> bites me >>> here is that the bytes type in Py3 *does* return integers on >>> iteration. So >>> returning 'char' on indexing and iteration would be both more >>> efficient and >>> more future proof. But it would also be impossible to keep >>> consistent in >>> Python-2, as faking it would mean that an untyped bytes object >>> would return >>> a substring, whereas a typed one would return an integer. And I >>> don't >>> really want to inject a type check branch into each getitem call to >>> override that behaviour... >>> >>> So ISTM that the only way to make this consistent is to follow >>> Python 2 for >>> now, including literals, and to accept the different (but also >>> consistent) >>> behaviour when running in Python 3. >>> >> "In the face of ambiguity, refuse the temptation to guess"? > > I'm not sure this applies here. We have existing Python semantics > for this, > after all. They just differ between Python 2 and Python 3. This is > just a > case where we can't easily guarantee one specific behaviour as we do > not > control the type's implementation.
Sounds like a perfect example of where the -3 flag should be used. >> I.e., I'd just disallow it from the language (that is, require a >> cast), >> because of this issue. I don't see iterating over string literals as >> important enough that one can't require a cast. > > Indexing based on a dynamically calculated index may be somewhat more > important though, and a cast makes that a lot more ugly. Also, > requiring a > cast would prevent us from compiling Python code that uses this (I > guess > that makes a case for following the Py2 semantics). +1 - Robert _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
