On 20.04.2017 20:53, Robert Kern wrote: > On Thu, Apr 20, 2017 at 6:15 AM, Julian Taylor > <[email protected] <mailto:[email protected]>> > wrote: > >> Do you have comments on how to go forward, in particular in regards to >> new dtype vs modify np.unicode? > > Can we restate the use cases explicitly? I feel like we ended up with > the current sub-optimal situation because we never really laid out the > use cases. We just felt like we needed bytestring and unicode dtypes, > more out of completionism than anything, and we made a bunch of > assumptions just to get each one done. I think there may be broad > agreement that many of those assumptions are "wrong", but it would be > good to reference that against concretely-stated use cases.
We ended up in this situation because we did not take the opportunity to break compatibility when python3 support was added. We should have made the string dtype an encoded byte type (ascii or latin1) in python3 instead of null terminated unencoded bytes which do not make very much practical sense. So the use case is very simple: Give users of the string dtype a migration path that does not involve converting to full utf32 unicode. The latin1 encoded bytes dtype would allow that. As we already have the infrastructure this same dtype can allow more than just latin1 with minimal effort, for the fixed size python supported stuff it is literally adding an enum entry, two new switch clauses and a little bit of dtype string parsing and testcases. Having some form of variable string handling would be nice. But this is another topic all together. Having builtin support for variable strings only seems overkill as the string dtype is not that important and object arrays should work reasonably well for this usecase already.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ NumPy-Discussion mailing list [email protected] https://mail.python.org/mailman/listinfo/numpy-discussion
