On Mon, Apr 24, 2017 at 7:41 PM, Nathaniel Smith <n...@pobox.com> wrote: > > On Mon, Apr 24, 2017 at 7:23 PM, Robert Kern <robert.k...@gmail.com> wrote: > > On Mon, Apr 24, 2017 at 7:07 PM, Nathaniel Smith <n...@pobox.com> wrote: > > > >> That said, AFAICT what people actually want in most use cases is support > >> for arrays that can hold variable-length strings, and the only place where > >> the current approach is *optimal* is when we need mmap compatibility with > >> legacy formats that use fixed-width-nul-padded fields (at which point it's > >> super convenient). It's not even possible to *represent* all Python strings > >> or bytestrings in current numpy unicode or string arrays (Python > >> strings/bytestrings can have trailing nuls). So if we're talking about > >> tweaks to the current system it probably makes sense to focus on this use > >> case specifically. > >> > >> From context I'm assuming FITS files use fixed-width-nul-padding for > >> strings? Is that right? I know HDF5 doesn't. > > > > Yes, HDF5 does. Or at least, it is supported in addition to the > > variable-length ones. > > > > https://support.hdfgroup.org/HDF5/doc/Advanced/UsingUnicode/index.html > > Doh, I found that page but it was (and is) meaningless to me, so I > went by http://docs.h5py.org/en/latest/strings.html, which says the > options are fixed-width ascii, variable-length ascii, or > variable-length utf-8 ... I guess it's just talking about what h5py > currently supports.
It's okay, I made exactly the same mistake earlier in the thread. :-) > But also, is it important whether strings we're loading/saving to an > HDF5 file have the same in-memory representation in numpy as they > would in the file? I *know* [1] no-one is reading HDF5 files using > np.memmap :-). Is it important for some other reason? The lack of such a dtype seems to be the reason why neither h5py nor PyTables supports that kind of HDF5 Dataset. The variable-length Datasets can take up a lot of disk-space because they can't be compressed (even accounting for the wasted padding space). I mean, they probably could have implemented it with objects arrays like h5py does with the variable-length string Datasets, but they didn't. https://github.com/PyTables/PyTables/issues/499 https://github.com/h5py/h5py/issues/624 -- Robert Kern
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion