On Wed, 30 Mar 2011 10:37:45 -0700, Matthew Brett wrote: [clip] > imagine I'm working with a non-latin default encoding, and I've opened a > file: > > fobj = open('my_nonlatin.txt', 'rt') > > in python 3.2. That might contain numbers and non-latin text. I can't > pass that into 'genfromtxt' because it will give me this error above. I > can pass it is as binary but then I'll get garbled text.
That's the way it also works on Python 2. The text is not garbled -- it's just in some binary representation that you can later on decode to unicode: >>> np.array(['asd']).view(np.chararray).decode('utf-8') array([u'asd'], dtype='<U3') Granted, utf-16 and the ilk might be problematic. > Should those functions also allow unicode-providing files (perhaps with > binary as default for speed)? Nobody has yet asked for this feature as far as I know, so I guess the need for it is pretty low. Personally, I don't think going unicode makes much sense here. First, it would be a Py3-only feature. Second, there is a real need for it only when dealing with multibyte encodings, which are seldom used these days with utf-8 rightfully dominating. -- Pauli Virtanen _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion