Re: [Numpy-discussion] proposal: smaller representation of string arrays

Julian Taylor Thu, 20 Apr 2017 12:41:12 -0700

On 20.04.2017 20:59, Anne Archibald wrote:
> On Thu, Apr 20, 2017 at 8:17 PM Julian Taylor
> <jtaylor.deb...@googlemail.com <mailto:jtaylor.deb...@googlemail.com>>
> wrote:
> 
>     I probably have formulated my goal with the proposal a bit better, I am
>     not very interested in a repetition of which encoding to use debate.
>     In the end what will be done allows any encoding via a dtype with
>     metadata like datetime.
>     This allows any codec (including truncated utf8) to be added easily (if
>     python supports it) and allows sidestepping the debate.
> 
>     My main concern is whether it should be a new dtype or modifying the
>     unicode dtype. Though the backward compatibility argument is strongly in
>     favour of adding a new dtype that makes the np.unicode type redundant.
> 
> 
> Creating a new dtype to handle encoded unicode, with the encoding
> specified in the dtype, sounds perfectly reasonable to me. Changing the
> behaviour of the existing unicode dtype seems like it's going to lead to
> massive headaches unless exactly nobody uses it. The only downside to a
> new type is having to find an obvious name that isn't already in use.
> (And having to actively  maintain/deprecate the old one.) 
> 
> Anne
>


We wouldn't really be changing the behaviour of the unicode dtype. Only
programs accessing the databuffer directly and trying to decode would
need to be changed.

I assume this can happen for programs that do serialization + reencoding
of numpy string arrays at the C level (at the python level you would be
fine).
These programs would be broken, but only when they actually receive a
string array that does not have the default utf32 encoding.

I really don't like that a fully new dtype means creating more junk and
extra code paths to numpy.
But it is probably do big of a compatibility break to accept to keep our
code clean.

signature.asc
Description: OpenPGP digital signature

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] proposal: smaller representation of string arrays

Reply via email to