On Mon, Apr 24, 2017 at 7:41 PM, Nathaniel Smith <n...@pobox.com> wrote:
> But also, is it important whether strings we're loading/saving to an > HDF5 file have the same in-memory representation in numpy as they > would in the file? I *know* [1] no-one is reading HDF5 files using > np.memmap :-). Of course they do :) https://github.com/jjhelmus/pyfive/blob/98d26aaddd6a7d83cfb189c113e172cc1b60d5f8/pyfive/low_level.py#L682 > Also, further searching suggests that HDF5 actually supports all of > nul termination, nul padding, and space padding, and that nul > termination is the default? How much does it help to have in-memory > compatibility with just one of these options (and not even the default > one)? Would we need to add the other options to be really useful for > HDF5? h5py actually ignores this option and only uses null termination. I have not heard any complaints about this (though I have heard complaints about the lack of fixed-length UTF-8). But more generally, you're right. h5py doesn't need a corresponding NumPy dtype for each HDF5 string dtype, though that would certainly be *convenient*. In fact, it already (ab)uses NumPy's dtype metadata with h5py.special_dtype to indicate a homogeneous string type for object arrays. I would guess h5py users have the same needs for efficient string representations (including surrogate-escape options) as other scientific users.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion