Also from the docstring: """ .. note:: The `chararray` class exists for backwards compatibility with Numarray, it is not recommended for new development. Starting from numpy 1.4, if one needs arrays of strings, it is recommended to use arrays of `dtype` `object_`, `string_` or `unicode_`, and use the free functions in the `numpy.char` module for fast vectorized string operations. """
Neil Crighton wrote: > Ah, ok, thanks. I missed the explanation in the doc string because I'm using > version 1.3 and forgot to check the web docs. > > For the record, this was my bug: I read a fits binary table with pyfits. One > of > the table fields was a chararray containing a bunch of flags > ('A','B','C','D'). > I tried to use in1d() to identify all entries with flags of 'C' or 'D'. So > > >>>> c = pyfits_table.chararray_column >>>> mask = np.in1d(c, ['C', 'D']) >>>> > > It turns out the actual stored values in the chararray were 'A ', 'B ', 'C > ' > and 'D '. in1d() converts the chararray to an ndarray before performing the > comparison, so none of the entries matches 'C' or 'D'. > This inconsistency is fixed in Numpy 1.4 (which included a major overhaul of chararrays). in1d will perform the auto whitespace-stripping on chararrays, but not on regular ndarrays of strings. > What is the best way to ensure this doesn't happen to other people? We could > change the array set operations to special-case chararrays, but this seems > like > an ugly solution. Is it possible to change something in pyfits to avoid this? > Pyfits continues to use chararray since not doing so would break existing code relying on this behavior. And there are many use cases where this behavior is desirable, particularly with fixed-length strings in tables. The best way to get around it from your code is to cast the chararray pyfits returns to a regular ndarray. The cast does not perform a copy, so should be very efficient: In [6]: from numpy import char In [7]: import numpy as np In [8]: c = char.array(['a ', 'b ']) In [9]: c Out[9]: chararray(['a', 'b'], dtype='|S2') In [10]: np.asarray(c) Out[11]: array(['a ', 'b '], dtype='|S2') I suggest casting between to either chararray or ndarray depending on whether you want the auto-whitespace-stripping behavior. Mike -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion