Perhaps `np.encoded_str[encoding]` as the name for the new type, if we decide a new type is necessary?
Am I right in thinking that the general problem here is that it's very easy to discard metadata when working with dtypes, and that by adding metadata to `unicode_`, we risk existing code carelessly dropping it? Is this a problem in both C and python, or just C? If that's the case, can we end up with a compromise where being careless just causes old code to promote to ucs32? On Thu, 20 Apr 2017 at 20:09 Anne Archibald <peridot.face...@gmail.com> wrote: > On Thu, Apr 20, 2017 at 8:17 PM Julian Taylor < > jtaylor.deb...@googlemail.com> wrote: > >> I probably have formulated my goal with the proposal a bit better, I am >> not very interested in a repetition of which encoding to use debate. >> In the end what will be done allows any encoding via a dtype with >> metadata like datetime. >> This allows any codec (including truncated utf8) to be added easily (if >> python supports it) and allows sidestepping the debate. >> >> My main concern is whether it should be a new dtype or modifying the >> unicode dtype. Though the backward compatibility argument is strongly in >> favour of adding a new dtype that makes the np.unicode type redundant. >> > > Creating a new dtype to handle encoded unicode, with the encoding > specified in the dtype, sounds perfectly reasonable to me. Changing the > behaviour of the existing unicode dtype seems like it's going to lead to > massive headaches unless exactly nobody uses it. The only downside to a new > type is having to find an obvious name that isn't already in use. (And > having to actively maintain/deprecate the old one.) > > Anne > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion