Hi,

Lisandro Dalcin wrote:
> On 6/4/08, Dag Sverre Seljebotn <[EMAIL PROTECTED]> wrote:
>>  Well, in my mind, this is a reason for supporting Stefan in removing
>>  auto-coercion of char* to Python strings altogether (that is suggested
>>  once down in those unicode discussion threads, right Stefan?).
> 
> Well, I believe that is the right approach. However, what would be the
> way to generate a byte string from a char* pointer?

The Python equivalent of a C char* is a byte string ("bytes" or "bytearray" in
Py3). I totally support auto-coercion between byte strings and char*. I'm just
opposed to coercing a unicode object to a char*, as that tends to be an easy
source of bugs rather than something that makes your life easier.

My current favourite are file names. lxml deals with two types of path names:
URLs and filesystem paths. Both are UTF-8 encoded when coming from an XML file
and users commonly pass byte strings in Py2 and unicode strings in Py3. UTF-8
encoded URLs are fine in the case of libxml2, but to access a file on the
local file system, the file name must always use the local file system
encoding, which is often an ISO encoding or stuff like cp1252 (IIRC). So it is
actually pretty involved to encode a file path (once you know that it actually
*is* a file path and not a URL), or to decode a user provided byte string path
into a unicode string, e.g. to print it in an error message (which must use
the encoding of the output device!).

Encodings can really, really become a complex matter...

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to