On Sep 9, 2009, at 11:24 AM, Stefan Behnel wrote:

> Robert Bradshaw wrote:
>> I've thought about this some more, and the amount of casting it would
>> take to get the C compiler to not complain when trying to treat
>> unsigned char* as strings, I actually don't think it's any natural to
>> convert strings to unsigned char*, so the double cast above seems
>> like the right thing to do
>
> Regarding the "natural" bit, libxml2 actually defines all its UTF-8  
> encoded
> byte strings as "unsigned char*". So, except for serialised XML,  
> basically
> every string you get from libxml2 uses that. This is so  
> inconvenient to
> work with in Cython that the original author of lxml actually went  
> for the
> simple 'solution' of declaring everything as plain char* and  
> passing "-w"
> to gcc (which is still in use today, although it already bit me  
> more than
> once).

Interesting. One of the reasons I was so quick to discard this is  
because I thought the usecase was that null characters needed to be  
embedded, which is completely orthogonal, and I couldn't think of  
anywhere I'd come across unsigned char* used for strings (but clearly  
libxml2 is such a library).

Just out of curiosity, does it use char* for ASCII and unsigned char*  
for utf-8 as a poor-man's typechecking for encoding?

> I don't think there's anything wrong with letting Cython do the  
> necessary
> casting under the hood.


http://trac.cython.org/cython_trac/ticket/359

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to