On Thu, Apr 20, 2017 at 12:17 PM, Robert Kern <robert.k...@gmail.com> wrote:
> On Thu, Apr 20, 2017 at 12:05 PM, Stephan Hoyer <sho...@gmail.com> wrote: > > > > On Thu, Apr 20, 2017 at 11:53 AM, Robert Kern <robert.k...@gmail.com> > wrote: > >> > >> I don't know of a format off-hand that works with numpy uniform-length > strings and Unicode as well. HDF5 (to my recollection) supports arrays of > NULL-terminated, uniform-length ASCII like FITS, but only variable-length > UTF8 strings. > > > > > > HDF5 supports two character sets, ASCII and UTF-8. Both come in fixed > and variable length versions: > > https://github.com/PyTables/PyTables/issues/499 > > https://support.hdfgroup.org/HDF5/doc/Advanced/UsingUnicode/index.html > > > > "Fixed length UTF-8" for HDF5 refers to the number of bytes used for > storage, not the number of characters. > > Ah, okay, I was interpolating from a quick perusal of the h5py docs, which > of course are also constrained by numpy's current set of dtypes. The > NULL-terminated ASCII works well enough with np.string's semantics. > Yes, except that on Python 3, "Fixed length ASCII" in HDF5 should correspond to a string type, not np.string_ (which is really bytes).
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion