Re: [Numpy-discussion] proposal: smaller representation of string arrays

Eric Wieser Thu, 20 Apr 2017 12:16:19 -0700

Perhaps `np.encoded_str[encoding]` as the name for the new type, if we
decide a new type is necessary?


Am I right in thinking that the general problem here is that it's very easy
to discard metadata when working with dtypes, and that by adding metadata
to `unicode_`, we risk existing code carelessly dropping it? Is this a
problem in both C and python, or just C?

If that's the case, can we end up with a compromise where being careless
just causes old code to promote to ucs32?

On Thu, 20 Apr 2017 at 20:09 Anne Archibald <[email protected]>
wrote:

> On Thu, Apr 20, 2017 at 8:17 PM Julian Taylor <
> [email protected]> wrote:
>
>> I probably have formulated my goal with the proposal a bit better, I am
>> not very interested in a repetition of which encoding to use debate.
>> In the end what will be done allows any encoding via a dtype with
>> metadata like datetime.
>> This allows any codec (including truncated utf8) to be added easily (if
>> python supports it) and allows sidestepping the debate.
>>
>> My main concern is whether it should be a new dtype or modifying the
>> unicode dtype. Though the backward compatibility argument is strongly in
>> favour of adding a new dtype that makes the np.unicode type redundant.
>>
>
> Creating a new dtype to handle encoded unicode, with the encoding
> specified in the dtype, sounds perfectly reasonable to me. Changing the
> behaviour of the existing unicode dtype seems like it's going to lead to
> massive headaches unless exactly nobody uses it. The only downside to a new
> type is having to find an obvious name that isn't already in use. (And
> having to actively  maintain/deprecate the old one.)
>
> Anne
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] proposal: smaller representation of string arrays

Reply via email to