On 20.04.2017 20:59, Anne Archibald wrote: > On Thu, Apr 20, 2017 at 8:17 PM Julian Taylor > <jtaylor.deb...@googlemail.com <mailto:jtaylor.deb...@googlemail.com>> > wrote: > > I probably have formulated my goal with the proposal a bit better, I am > not very interested in a repetition of which encoding to use debate. > In the end what will be done allows any encoding via a dtype with > metadata like datetime. > This allows any codec (including truncated utf8) to be added easily (if > python supports it) and allows sidestepping the debate. > > My main concern is whether it should be a new dtype or modifying the > unicode dtype. Though the backward compatibility argument is strongly in > favour of adding a new dtype that makes the np.unicode type redundant. > > > Creating a new dtype to handle encoded unicode, with the encoding > specified in the dtype, sounds perfectly reasonable to me. Changing the > behaviour of the existing unicode dtype seems like it's going to lead to > massive headaches unless exactly nobody uses it. The only downside to a > new type is having to find an obvious name that isn't already in use. > (And having to actively maintain/deprecate the old one.) > > Anne >
We wouldn't really be changing the behaviour of the unicode dtype. Only programs accessing the databuffer directly and trying to decode would need to be changed. I assume this can happen for programs that do serialization + reencoding of numpy string arrays at the C level (at the python level you would be fine). These programs would be broken, but only when they actually receive a string array that does not have the default utf32 encoding. I really don't like that a fully new dtype means creating more junk and extra code paths to numpy. But it is probably do big of a compatibility break to accept to keep our code clean.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion