On Wed, Apr 26, 2017 at 11:38 AM, Sebastian Berg <sebast...@sipsolutions.net> wrote:
> I remember talking with a colleague about something like that. And > basically an annoying thing there was that if you strip the zero bytes > in a zero padded string, some encodings (UTF16) may need one of the > zero bytes to work right. (I think she got around it, by weird > trickery, inverting the endianess or so and thus putting the zero bytes > first). > Maybe will ask her if this discussion is interesting to her. Though I > think it might have been something like "make everything in > hdf5/something similar work" without any actual use case, I don't know. I don't think that will be an issue for an encoding-parameterized dtype. The decoding machinery of that would have access to the full-width buffer for the item, and the encoding knows what it's atomic unit is (e.g. 2 bytes for UTF-16). It's only if you have to hack around at a higher level with numpy's S arrays, which return Python byte strings that strip off the trailing NULL bytes, that you have to worry about such things. Getting a Python scalar from the numpy S array loses information in such cases. -- Robert Kern
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion