Julian -- beat me to it! On Wed, Jan 15, 2014 at 10:25 AM, Julian Taylor < jtaylor.deb...@googlemail.com> wrote:
> On 15.01.2014 18:57, Charles R Harris wrote: > > There was a discussion of this long ago and UCS-4 was chosen as the > > numpy standard. There are just too many complications that arise in > > supporting both. > supporting both UCS-4 and UCS-2 would be more pain than it's worth. > In python3 you need extra code to deal with arrays containing strings as > the S type is interpreted as bytes which is not a string type anymore [0]. > ouch! I was just assuming that it still was -- yes, I really think we need a one-byte-per char string type -- probably ascii, but we could do latin-1 and let the buyer beware of the higher value bytes Someone on irc (I think Freddie Witherden CC'd) had a use case with huge > ascii tables in numpy which now have to be stored as 4 bytes unicode on > disk or decode bytes all the time. > and ascii data is not the least bit rare in the science world in particular. > I personally don't use strings in arrays so I can neither judge the > impact nor the use, but it seems to me like at least having an ascii > dtype for python2<->python3 compatibility would be useful. > I think py2<->py3 compatibilty is a separate issue -- we should have this if it's a good thing to have, not because of that. And it is a good thing to have. And since this is a new thread -- regardless of the decision on this, loadtxt is broken -- we certainly should be able to parse ascii text and return something reasonable -- unicode strings would have been fine in the OPs case, if they didn't have the extra bytes to tring crap in them. [0] https://github.com/numpy/numpy/issues/4162 from that: The transition towards split string/bytes types in Python 3 has the unfortunate side effect of breaking the following snippet: np.array("Hello", dtype="|S").item() == "Hello" Sorry for not testing in py3, but this makes it look like the "S" dtype is one-byte per char strings, but creates a bytes object, rather than a unicode (py3 str) object. As in my other note, I think it would be better to have it return a unicode string by default. But it looks like you can still use it to store large quantities of ascii data if you want. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion