> You say "that's fine" but my issue was one of usability, which hasn't > been addressed.
I think this might be a good point for trying to list the actual use-cases. I'll make a start and you can see if you find more; and how important each of them are. There seems to be a usecase for every possible stance (which I'll iterate as utf-8 auto-conversion, ascii auto-conversion failing on non-ascii data, and no automatic conversion), so it is about weighing the importancy of each. Interfacing with C code/libs: - Language libraries (spell checking etc.). These will often work in one specific encoding or allow you to specify the encoding the data is in; typically, one would want to be specific about conversions in this case. - Passing filenames. This seems to be a common case; open a file picker in a Python GUI lib and pass the resulting filename to a library taking a datafile parameter. Assuming the file picker returns a str/unicode (would be nice if it returned bytes though) then auto-conversion would be nice to have, however UTF-8 would be the wrong choice on many platforms (including Windows, I think? Not sure about Vista.) - Getting error messages. These are likely to either be in a hard-coded encoding or platform default, no guarantee for UTF-8 so require encoding consciousness. - Passing UI messages. Think writing a wrapper around a GUI lib. In that case it is again usually platform default that is wanted, which is not UTF-8 for very many users (not sure about newer Windows libs, in the old libs one had the choice between 8-bit and 16-bit Windows codepages IIRC). So encoding consciousness is needed. - En-/decryption and (de)compression libs, binary serialization libs, etc. Here, UTF-8 auto-conversion would be incredibly excellent (ie if one wants to encrypt or compress strings, and read them back again into the same environment they came from). - Text parsing/serialization libs: One would need to be consciuos about encoding one way or another, likely encoding would have to be part of the API, or in some cases, one would deal with bytes in Cython. Internal Cython usecases: All in all, using Python strings seems better when not dealing with external C code and I've failed to find good usecases; perhaps anyone else has got one? - Using char* rather than unicode for optimization purposes. Early-binding unicode objects: typedef str s should deal with some of these cases, if something like this doesn't happen already like with list (will it be as efficient as copying between buffers with strcat and friends? I can imagine more efficient due to less copying potentially happening with a smarter string type...) - Then there are cases where one wants to do some string modification quickly, element by element. But almost all cases I could think of would fail on a UTF-8 char* (string reversal, palindrome creation, merging strings character by character, alphabet-based ROT-13... all such things would fail with a naive UTF-8 char*, and if one is conscious about understanding UTF-8 in order to do these properly one should be able to explicitly convert as well). Dag Sverre _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
