This thread is getting a little out of hand which is my fault for initially mixing different topics in one mail, so let me try to summarize: We have three issues here:
- a loadtxt bug when loading strings in python3 this has nothing to do with encodings or dtypes it is a bug that should be fixed. Not more not less. the fix is probably removing a repr() somewhere and converting the data to unicode as the user requested as str == unicode in py3, this is the normal change you must account for when migrating to p3. - no possibility to specify the encoding of a file in loadtxt this is a missing feature, currently it uses the system default which is good and should stay that way. It is only missing an option to tell it to treat it differently. There should be little debate about changing the default, especially not using latin1. The system default exists for a good reason. Note on linux it is UTF-8 which is a good choice. I'm not familiar with windows but all programs should at least have the option to use UTF-8 as output too. This has nothing to do with indexing or any kind of processing of the numpy arrays. The fix should be trivial to do, just add an encoding keyword argument and pass it on to python. The workaround should be passing a file object to loadtxt instead of a file name. Python file objects already have the encoding argument. - inconvenience in dealing with strings in python 3. bytes are not strings in python3 which means ascii data is either a byte array which can be inconvenient to deal with or 4 byte unicode which wastes space. A proposal to fix this would be to add a one or two byte dtype with a specific encoding that behaves similar to bytes but converts to string when outputting to python for comparisons etc. For backward compatibility we *cannot* change S. Maybe we could change the meaning of 'a' but it would be safer to add a new dtype, possibly 'S' can be deprecated in favor of 'B' when we have a specific encoding dtype. The main issue is probably: is it worth it and who does the work?
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion