On Mon, Apr 24, 2017 at 2:47 PM, Robert Kern <robert.k...@gmail.com> wrote:
> On Mon, Apr 24, 2017 at 10:51 AM, Aldcroft, Thomas < > aldcr...@head.cfa.harvard.edu> wrote: > > > > On Mon, Apr 24, 2017 at 1:04 PM, Chris Barker <chris.bar...@noaa.gov> > wrote: > > >> - round-tripping of binary data (at least with Python's > encoding/decoding) -- ANY string of bytes can be decodes as latin-1 and > re-encoded to get the same bytes back. You may get garbage, but you won't > get an EncodingError. > > > > +1. The key point is that there is a HUGE amount of legacy science data > in the form of FITS (astronomy-specific binary file format that has been > the primary file format for 20+ years) and HDF5 which uses a character data > type to store data which can be bytes 0-255. Getting an decoding/encoding > error when trying to deal with these datasets is a non-starter from my > perspective. > > That says to me that these are properly represented by `bytes` objects, > not `unicode/str` objects encoding to and decoding from a hardcoded latin-1 > encoding. > If you could go back 30 years and get every scientist in the world to do the right thing, then sure. But we are living in a messy world right now with messy legacy datasets that have character type data that are *mostly* ASCII, but not infrequently contain non-ASCII characters. So I would beg to actually move forward with a pragmatic solution that addresses very real and consequential problems that we face instead of waiting/praying for a perfect solution. - Tom > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion