On Apr 26, 2017 9:30 AM, "Chris Barker - NOAA Federal" < chris.bar...@noaa.gov> wrote:
UTF-8 does not match the character-oriented Python text model. Plenty of people argue that that isn't the "correct" model for Unicode text -- maybe so, but it is the model python 3 has chosen. I wrote a much longer rant about that earlier. So I think the easy to access, and particularly defaults, numpy string dtypes should match it. This seems a little vague? The "character-oriented Python text model" is just that str supports O(1) indexing of characters. But... Numpy doesn't. If you want to access individual characters inside a string inside an array, you have to pull out the scalar first, at which point the data is copied and boxed into a Python object anyway, using whatever representation the interpreter prefers. So AFAICT​ it makes literally no difference to the user whether numpy's internal representation allows for fast character access. -n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion