On Thu, Jan 23, 2014 at 12:10 PM, <josef.p...@gmail.com> wrote: > > Exactly -- but what should those conversion/casting rules be? We can't > > decide that unless we decide if 'S' is for text or for arbitrary bytes > -- it > > can't be both. I say text, that's what it's mostly trying to do already. > But > > if it's bytes, fine, then some things still need cleaning up, and we > could > > really use a one-byte-text type. and if it's text, then we may need a > bytes > > dtype. > > (remember I'm just a balcony muppet) >
me too ;-) > As far as I understand all codecs have the same ascii part. nope -- certainly not multi-byte codecs. And one of the key points of utf-8 is that the ascii part is compatible -- none of teh other full-unicode encoding are. many of the one-byte-per-char ones do share the ascii part, but not all, or not completely. So I would > cast on ascii and raise on anything else. > still a fine option -- clearly defined and quite useful for scientific text. However, I would prefer latin-1 -- that way you might get garbage for the non-ascii parts, but it wouldn't raise an exception and it round-trips through encoding/decoding. And you would have a somewhat more useful subset -- including the latin-language character and symbols like the degree symbol, etc. > or follow whatever the convention of numpy is: > > >>> s = -256 > >>> np.array((s,), dtype=np.uint8)[0] == s > False > >>> s = -1 > >>> np.array((s,), dtype=np.uint8)[0] == s > False > I think text is distinct enough from numbers that we don't need to do that same thing -- and this is result of well-defined casting rules built into the compiler (and hardware?) for the numeric types. I dont hink we have either the standard or compiler support for text conversions like that. -CHB PS: this is interesting, on py2: In [176]: a = np.array((2222,), dtype='S') In [177]: a Out[177]: array(['2'], dtype='|S1') It converts it to a string, but only grabs the first character? (is it determining the size before converting to a string? and this: In [182]: a = np.array(2222, dtype='S') In [183]: a Out[183]: array('2222', dtype='|S24') 24 ? where did that come from? > > Josef > > > > > Key here is that we don't have the option of not breaking anything, > because > > there is a lot already broken. > > > > -Chris > > > > > > -- > > > > Christopher Barker, Ph.D. > > Oceanographer > > > > Emergency Response Division > > NOAA/NOS/OR&R (206) 526-6959 voice > > 7600 Sand Point Way NE (206) 526-6329 fax > > Seattle, WA 98115 (206) 526-6317 main reception > > > > chris.bar...@noaa.gov > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion