Re: [Numpy-discussion] proposal: smaller representation of string arrays

Aldcroft, Thomas Mon, 24 Apr 2017 11:57:38 -0700

On Mon, Apr 24, 2017 at 2:47 PM, Robert Kern <[email protected]> wrote:


> On Mon, Apr 24, 2017 at 10:51 AM, Aldcroft, Thomas <
> [email protected]> wrote:
> >
> > On Mon, Apr 24, 2017 at 1:04 PM, Chris Barker <[email protected]>
> wrote:
>
> >> - round-tripping of binary data (at least with Python's
> encoding/decoding) -- ANY string of bytes can be decodes as latin-1 and
> re-encoded to get the same bytes back. You may get garbage, but you won't
> get an EncodingError.
> >
> > +1.  The key point is that there is a HUGE amount of legacy science data
> in the form of FITS (astronomy-specific binary file format that has been
> the primary file format for 20+ years) and HDF5 which uses a character data
> type to store data which can be bytes 0-255.  Getting an decoding/encoding
> error when trying to deal with these datasets is a non-starter from my
> perspective.
>
> That says to me that these are properly represented by `bytes` objects,
> not `unicode/str` objects encoding to and decoding from a hardcoded latin-1
> encoding.
>

If you could go back 30 years and get every scientist in the world to do
the right thing, then sure.  But we are living in a messy world right now
with messy legacy datasets that have character type data that are *mostly*
ASCII, but not infrequently contain non-ASCII characters.

So I would beg to actually move forward with a pragmatic solution that
addresses very real and consequential problems that we face instead of
waiting/praying for a perfect solution.

- Tom


>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] proposal: smaller representation of string arrays

Reply via email to