Re: [Cython] String types with Python 2.x and 3.x

Stefan Behnel Sat, 12 Sep 2009 10:00:17 -0700

Dominic Sacré wrote:
> Well, actually my module only needs to be able to handle ASCII, because 
> the protocol it implements doesn't support anything else.
> So it seems weird and in many cases very cumbersome use unicode 
> internally, especially with Py2, where usually all string coming from 
> Python will not be unicode in the first place.


Then this sounds like a case for using ASCII encoded byte strings internally.


> I think I'll try to go back to that approach again, and insert 
> encoding/decoding wherever necessary to make sure that no unicode 
> strings get in, and no byte strings get out...

You should write a little input normalisation function that does a quick
check with PyString_CheckExact() as a fast path.

You might also want to check the strings for stuff like \0 bytes and values
>= 0x80 in that case. Users will usually be happy to get an exception,
instead of having to chase weird bugs due to dirty data.


> By the way, another issue I've stumbled upon:
> With Py3, str(42) does not work as one would expect, because it actually 
> creates a bytes object of length 42, filled with zeroes. Should this be 
> considered a bug, or is it just one of the awkward consequences of 'str' 
> meaning 'bytes' with Py3?

It's just like the bytes type in Py 2.6 isn't quite what you'd expect.

You'll notice similar differences for other builtins across Python
versions, e.g. Py3's zip(). That's things you have to live with.

Note that this is still different from *literals* meaning different things
in different Python versions. The exact behaviour of builtins (and any
other objects) can naturally change when running in different environments.

Stefan

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] String types with Python 2.x and 3.x

Reply via email to