Re: [Numpy-discussion] chararray stripping trailing whitespace a bug?

Michael Droettboom Mon, 10 May 2010 13:52:32 -0700

Also from the docstring:

"""
.. note::
   The `chararray` class exists for backwards compatibility with
   Numarray, it is not recommended for new development. Starting from numpy
   1.4, if one needs arrays of strings, it is recommended to use arrays of
   `dtype` `object_`, `string_` or `unicode_`, and use the free functions
   in the `numpy.char` module for fast vectorized string operations.
"""



Neil Crighton wrote:
> Ah, ok, thanks. I missed the explanation in the doc string because I'm using
> version 1.3 and forgot to check the web docs.
>
> For the record, this was my bug: I read a fits binary table with pyfits.  One 
> of
> the table fields was a chararray containing a bunch of flags 
> ('A','B','C','D').  
> I tried to use in1d() to identify all entries with flags of 'C' or 'D'. So
>
>   
>>>> c = pyfits_table.chararray_column
>>>> mask = np.in1d(c, ['C', 'D'])
>>>>         
>
> It turns out the actual stored values in the chararray were 'A  ', 'B  ', 'C  
> '
> and 'D  '. in1d() converts the chararray to an ndarray before performing the
> comparison, so none of the entries matches 'C' or 'D'.
>   
This inconsistency is fixed in Numpy 1.4 (which included a major 
overhaul of chararrays).  in1d will perform the auto 
whitespace-stripping on chararrays, but not on regular ndarrays of strings.
> What is the best way to ensure this doesn't happen to other people?  We could
> change the array set operations to special-case chararrays, but this seems 
> like 
> an ugly solution. Is it possible to change something in pyfits to avoid this?
>   
Pyfits continues to use chararray since not doing so would break 
existing code relying on this behavior.  And there are many use cases 
where this behavior is desirable, particularly with fixed-length strings 
in tables. 

The best way to get around it from your code is to cast the chararray 
pyfits returns to a regular ndarray.  The cast does not perform a copy, 
so should be very efficient:

In [6]: from numpy import char
In [7]: import numpy as np
In [8]: c = char.array(['a ', 'b '])
In [9]: c
Out[9]:
chararray(['a', 'b'],
      dtype='|S2')
In [10]: np.asarray(c)
Out[11]:
array(['a ', 'b '],
      dtype='|S2')

I suggest casting between to either chararray or ndarray depending on 
whether you want the auto-whitespace-stripping behavior.

Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] chararray stripping trailing whitespace a bug?

Reply via email to