On Jan 22, 2014, at 1:13 PM, Oscar Benjamin <oscar.j.benja...@gmail.com> wrote:
> > It's not safe to stop removing the null bytes. This is how numpy determines > the length of the strings in a dtype='S' array. The strings are not > "fixed-width" but rather have a maximum width. Exactly--but folks have told us on this list that they want (and are) using the 'S' style for arbitrary bytes, NOT for text. In which case you wouldn't want to remove null bytes. This is more evidence that 'S' was designed to handle c-style one-byte-per-char strings, and NOT arbitrary bytes, and thus not to map directly to the py2 string type (you can store null bytes in a py2 string" Which brings me back to my original proposal: properly map the 'S' type to the py3 data model, and maybe add some kind of fixed width bytes style of there is a use case for that. I still have no idea what the use case might be. > If the trailing nulls are not removed then you would get: > >>>> a[0] > b'a\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>>> len(a[0]) > 9 > > And I'm sure that someone would get upset about that. Only if they are using it for text-which you "should not" do with py3. > Having the null bytes removed and a str (on Py2) object returned is precisely > the use case that distinguishes it from np.uint8. But that was because it was designed to be used with text . And if you want text, then you should use py3 strings, not bytes. And if you really want bytes, then you wouldn't want null bytes removed. > The other differences are the > removal of arithmetic operations. And 'S' is treated as an atomic element, I'm not sure how you can do that cleanly with uint8. > Some more oddities: > >>>> a[0] = 1 >>>> a > array([b'1', b'string', b'of', b'different', b'length', b'words'], > dtype='|S9') >>>> a[0] = None >>>> a > array([b'None', b'string', b'of', b'different', b'length', b'words'], > dtype='|S9') More evidence that this is a text type..... -Chris _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion