2017-04-25 12:34 GMT-04:00 Chris Barker <chris.bar...@noaa.gov>: > I am totally euro-centric, but as I understand it, that is the whole point > of the desire for a compact one-byte-per character encoding. If there is a > strong need for other 1-byte encodings (shift-JIS, maybe?) then maybe we > should support that. But this all started with "mostly ascii". My take on > that is:
But Shift-JIS is not one-byte; it's two-byte (unless you allow only half-width characters and nothing else). :-) In fact legacy CJK encodings are all nominally two-byte (so that the width of a character's internal representation matches that of its visual representation). > - filenames > > File names are one of the key reasons folks struggled with the python3 data > model (particularly on *nix) and why 'surrogateescape' was added. It's > pretty common to store filenames in with our data, and thus in numpy arrays > -- we need to preserve them exactly and display them mostly right. Again, > euro-centric, but if you are euro-centric, then latin-1 is a good choice for > this. This I don't understand. As far as I can tell non-Western-European filenames are not unusual. If filenames are a reason, even if you're euro-centric (think Eastern Europe, say) I don't see how latin1 is a good choice. Lurker here, and I haven't touched numpy in ages. So I might be blurting out nonsense. -- Ambrose Li // http://o.gniw.ca / http://gniw.ca If you saw this on CE-L: You do not need my permission to quote me, only proper attribution. Always cite your sources, even if you have to anonymize and/or cite it as "personal communication". _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion