On Tue, Apr 25, 2017 at 1:30 PM, Charles R Harris <charlesr.har...@gmail.com > wrote:
> > > On Tue, Apr 25, 2017 at 12:52 PM, Robert Kern <robert.k...@gmail.com> > wrote: > >> On Tue, Apr 25, 2017 at 11:18 AM, Charles R Harris < >> charlesr.har...@gmail.com> wrote: >> > >> > On Tue, Apr 25, 2017 at 11:34 AM, Anne Archibald < >> peridot.face...@gmail.com> wrote: >> >> >> Clearly there is a need for fixed-storage-size zero-padded UTF-8; two >> other packages are waiting specifically for it. But specifying this >> requires two pieces of information: What is the encoding? and How is the >> length specified? I know they're not numpy-compatible, but FITS header >> values are space-padded; does that occur elsewhere? Are there other ways >> existing data specifies string length within a fixed-size field? There are >> some cryptographic length-specification tricks - ANSI X.293, ISO 10126, >> PKCS7, etc. - but they are probably too specialized to need? We should make >> sure we can support all the ways that actually occur. >> > >> > >> > Agree with the UTF-8 fixed byte length strings, although I would tend >> towards null terminated. >> >> Just to clarify some terminology (because it wasn't originally clear to >> me until I looked it up in reference to HDF5): >> >> * "NULL-padded" implies that, for a fixed width of N, there can be up to >> N non-NULL bytes. Any extra space left over is padded with NULLs, but no >> space needs to be reserved for NULLs. >> >> * "NULL-terminated" implies that, for a fixed width of N, there can be up >> to N-1 non-NULL bytes. There must always be space reserved for the >> terminating NULL. >> >> I'm not really sure if "NULL-padded" also specifies the behavior for >> embedded NULLs. It's certainly possible to deal with them: just strip >> trailing NULLs and leave any embedded ones alone. But I'm also sure that >> there are some implementations somewhere that interpret the requirement as >> "stop at the first NULL or the end of the fixed width, whichever comes >> first", effectively being NULL-terminated just not requiring the reserved >> space. >> > > Thanks for the clarification. NULL-padded is what I meant. > > I'm wondering how much of the desired functionality we could get by simply > subclassing ndarray in python. I think we mostly want to be able to view > byte strings and convert to unicode if needed. > > And I think the really tricky part is sorting and rich comparison. Unfortunately, the comparison function is currently located in the c structure. I suppose we could define a c wrapper function to go in the slot. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion