On Sep 9, 2009, at 11:24 AM, Stefan Behnel wrote: > Robert Bradshaw wrote: >> I've thought about this some more, and the amount of casting it would >> take to get the C compiler to not complain when trying to treat >> unsigned char* as strings, I actually don't think it's any natural to >> convert strings to unsigned char*, so the double cast above seems >> like the right thing to do > > Regarding the "natural" bit, libxml2 actually defines all its UTF-8 > encoded > byte strings as "unsigned char*". So, except for serialised XML, > basically > every string you get from libxml2 uses that. This is so > inconvenient to > work with in Cython that the original author of lxml actually went > for the > simple 'solution' of declaring everything as plain char* and > passing "-w" > to gcc (which is still in use today, although it already bit me > more than > once).
Interesting. One of the reasons I was so quick to discard this is because I thought the usecase was that null characters needed to be embedded, which is completely orthogonal, and I couldn't think of anywhere I'd come across unsigned char* used for strings (but clearly libxml2 is such a library). Just out of curiosity, does it use char* for ASCII and unsigned char* for utf-8 as a poor-man's typechecking for encoding? > I don't think there's anything wrong with letting Cython do the > necessary > casting under the hood. http://trac.cython.org/cython_trac/ticket/359 - Robert _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
