On Wed, Jan 15, 2014 at 9:57 AM, Charles R Harris <charlesr.har...@gmail.com > wrote:
> There was a discussion of this long ago and UCS-4 was chosen as the numpy > standard. There are just too many complications that arise in supporting > both. > fair enough -- but loadtxt appears to be broken just the same. Any proposals for that? My proposal: loadtxt accepts an encoding argument. default is ascii -- that's what it's doing now, anyway, yes? If the file is encoded ascii, then a one-byte-per character dtype is used for text data, unless the user specifies otherwise (do they need to specify anyway?) If the file has another encoding, the the default dtype for text is unicode. Not sure about other one-byte per character encodings (e.g. latin-1) The defaults may be moot, if the loadtxt doesn't have auto-detection of text in a filie anyway. This all required that there be an obvious way for the user to spell the one-byte-per character dtype -- I think 'S' will do it. Note to OP: what happens if you specify 'S' for your dtype, rather than str - it works for me on py2: In [16]: np.loadtxt('pathlist.txt', dtype='S') Out[16]: array(['C:\\Users\\Documents\\Project\\mytextfile1.txt', 'C:\\Users\\Documents\\Project\\mytextfile2.txt', 'C:\\Users\\Documents\\Project\\mytextfile3.txt'], dtype='|S42') Note: this leaves us with what to pass back to the user when they index into an array of type 'S*' -- a bytes object or a unicode object (decoded as ascii). I think a unicode object, in keeping with proper py3 behavior. This would be like we currently do with, say floating point numbers: We can store/operate with 32 bit floats, but when you pass it back as a python type, you get the native python float -- 64bit. NOTE: another option is to use latin-1 all around, rather than ascii -- you may get garbage from the higher value bytes, but it won't barf on you. -Chris > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion