On Jan 22, 2014, at 1:13 PM, Oscar Benjamin <oscar.j.benja...@gmail.com> wrote:

>
> It's not safe to stop removing the null bytes. This is how numpy determines
> the length of the strings in a dtype='S' array. The strings are not
> "fixed-width" but rather have a maximum width.

Exactly--but folks have told us on this list that they want (and are)
using the 'S' style for arbitrary bytes, NOT for text. In which case
you wouldn't want to remove null bytes. This is more evidence that 'S'
was designed to handle c-style one-byte-per-char strings, and NOT
arbitrary bytes, and thus not to map directly to the py2 string type
(you can store null bytes in a py2 string"

Which brings me back to my original proposal: properly map the 'S'
type to the py3 data model, and maybe add some kind of fixed width
bytes style of there is a use case for that. I still have no idea what
the use case might be.

> If the trailing nulls are not removed then you would get:
>
>>>> a[0]
> b'a\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>>> len(a[0])
> 9
>
> And I'm sure that someone would get upset about that.

Only if they are using it for text-which you "should not" do with py3.

> Having the null bytes removed and a str (on Py2) object returned is precisely
> the use case that distinguishes it from np.uint8.

But that was because it was designed to be used with text . And if you
want text, then you should use py3 strings, not bytes. And if you
really want bytes, then you wouldn't want null bytes removed.

> The other differences are the
> removal of arithmetic operations.

And 'S' is treated as an atomic element, I'm not sure how you can do
that cleanly with uint8.

> Some more oddities:
>
>>>> a[0] = 1
>>>> a
> array([b'1', b'string', b'of', b'different', b'length', b'words'],
>      dtype='|S9')
>>>> a[0] = None
>>>> a
> array([b'None', b'string', b'of', b'different', b'length', b'words'],
>      dtype='|S9')

More evidence that this is a text type.....

-Chris
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to