On Tue, Apr 25, 2017 at 12:52 PM, Robert Kern <robert.k...@gmail.com> wrote:
> On Tue, Apr 25, 2017 at 11:18 AM, Charles R Harris < > charlesr.har...@gmail.com> wrote: > > > > On Tue, Apr 25, 2017 at 11:34 AM, Anne Archibald < > peridot.face...@gmail.com> wrote: > > >> Clearly there is a need for fixed-storage-size zero-padded UTF-8; two > other packages are waiting specifically for it. But specifying this > requires two pieces of information: What is the encoding? and How is the > length specified? I know they're not numpy-compatible, but FITS header > values are space-padded; does that occur elsewhere? Are there other ways > existing data specifies string length within a fixed-size field? There are > some cryptographic length-specification tricks - ANSI X.293, ISO 10126, > PKCS7, etc. - but they are probably too specialized to need? We should make > sure we can support all the ways that actually occur. > > > > > > Agree with the UTF-8 fixed byte length strings, although I would tend > towards null terminated. > > Just to clarify some terminology (because it wasn't originally clear to me > until I looked it up in reference to HDF5): > > * "NULL-padded" implies that, for a fixed width of N, there can be up to N > non-NULL bytes. Any extra space left over is padded with NULLs, but no > space needs to be reserved for NULLs. > > * "NULL-terminated" implies that, for a fixed width of N, there can be up > to N-1 non-NULL bytes. There must always be space reserved for the > terminating NULL. > > I'm not really sure if "NULL-padded" also specifies the behavior for > embedded NULLs. It's certainly possible to deal with them: just strip > trailing NULLs and leave any embedded ones alone. But I'm also sure that > there are some implementations somewhere that interpret the requirement as > "stop at the first NULL or the end of the fixed width, whichever comes > first", effectively being NULL-terminated just not requiring the reserved > space. > Thanks for the clarification. NULL-padded is what I meant. I'm wondering how much of the desired functionality we could get by simply subclassing ndarray in python. I think we mostly want to be able to view byte strings and convert to unicode if needed. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion