Hi Chris, Just stumbled on this discussion (I'm the lead author of h5py).
We would be overjoyed if there were a 1-byte text type available in NumPy. String handling is the source of major pain right now in the HDF5 world. All HDF5 strings are text (opaque types are used for binary data), but we're forced into using the "S" type most of the time because (1) the "U" type doesn't round-trip between HDF5 and NumPy, as there's no fixed-width wide-character string type in HDF5, and (2) "U" takes 4x the space, which is a problem for big scientific datasets. ASCII-only would be preferable, partly for selfish reasons (HDF5's default is ASCII only), and partly to make it possible to copy them into containers labelled "UTF-8" without manually inspecting every value. > """At the high-level interface, h5py exposes three kinds of strings. Each > maps to a specific type within Python (but see str_py3 below): > > Fixed-length ASCII (NumPy S type) > .... > """ > This is wrong, or mis-guided, or maybe only a little confusing -- 'S' is not > an ASCII string (even though I wish it were...). But clearly the HDF folsk > think we need one! Yes, this was intended to state that the HDF5 "Fixed-width ASCII" type maps to NumPy "S" at conversion time, which is obviously a wretched solution on Py3. >>>> dset = f.create_dataset("string_ds", (100,), dtype="S10") > """ > Pardon my py3 ignorance -- is numpy.string_ the same as 'S' in py3? Form > another post, I thought you'd need to use numpy.bytes_ (which is the same on > py2) It does produce an instance of 'numpy.bytes_', although I think the h5py docs should be changed to use bytes_ explicitly. Andrew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion