I now have a rather large patch ready which addresses the following issues with chararrays. Would it be possible to get SVN commit priviledges, or would you prefer a patch file?
1) Fix bugs in Trac http://projects.scipy.org/numpy/ticket/1199 (chararray.expandtabs broken) http://projects.scipy.org/numpy/ticket/856 (chararray __mod__ error) http://projects.scipy.org/numpy/ticket/855 (chararray __mul__ error) http://projects.scipy.org/numpy/ticket/1231 (chararray methods ignore all arguments following the first argument that evaluates to False) http://projects.scipy.org/numpy/ticket/1235 (Coercing object arrays to string arrays has surprising behaviour) http://projects.scipy.org/numpy/ticket/1240 (Casting from Unicode to String array ignores exception) http://projects.scipy.org/numpy/ticket/1241 (Array constructed with mixture of str and unicode objects fails length detection) I can provide small individual patches for some of these if necessary, but some are interrelated and can only be fixed by the "whole enchilada". 2) Improve documentation Every method now has a docstring, and a new page of routines has been added to the Sphinx tree. 3) Improve unit test coverage Full line-by-line coverage of defchararray.py, as well as lots of hairy Unicode side cases. 4a) Create C-based vectorized string operations This is benchmarking about 5x faster than the old Python-based looping on a large database of around 20k astronomical objects 4b) Refactor chararray class in terms of those 4c) Design and create an interface to those methods that will be the "right way" going forward All vectorized string operations are now available as regular functions in the numpy.char namespace. Usage of the chararray view class is only recommended for numarray backward compatibility. A few side notes: http://projects.scipy.org/numpy/ticket/1200 (chararray.rstrip inconsistency) This bug I believe should be marked as "won't fix". The inconsistent handling of trailing whitespace inconsistency is an unfortunate "feature" of the chararray class, and I am wary that fixing it may break backward compatibility. However, the new free functions in numpy.char do not have this inconsistency, so they should be recommended for new code. http://projects.scipy.org/numpy/ticket/1240 (Casting from Unicode to String array ignores exception) This bug probably needs review by someone deeply familiar with the low-level internals, as it affects more than just string and unicode arrays. It doesn't break any of the unit tests, for what it's worth ;) Cheers, Mike David Goldsmith wrote: > Great, thanks! > > DG > > On Fri, Sep 25, 2009 at 6:07 AM, Michael Droettboom <[email protected] > <mailto:[email protected]>> wrote: > > David Goldsmith wrote: > > On Tue, Sep 22, 2009 at 4:02 PM, Ralf Gommers > > <[email protected] > <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>>> wrote: > > > > > > On Tue, Sep 22, 2009 at 1:58 PM, Michael Droettboom > > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > Trac has these bugs. Any others? > > > > http://projects.scipy.org/numpy/ticket/1199 > > http://projects.scipy.org/numpy/ticket/1200 > > http://projects.scipy.org/numpy/ticket/856 > > http://projects.scipy.org/numpy/ticket/855 > > http://projects.scipy.org/numpy/ticket/1231 > > > > > > This one: > > > > http://article.gmane.org/gmane.comp.python.numeric.general/23638/match=chararray > > > > Cheers, > > Ralf > > > > > > That last one never got "promoted" to a ticket? > It's a symptom of this bug, that I created and produced a patch for > yesterday: > > http://projects.scipy.org/numpy/ticket/1235 > > Mike > > > -- > Michael Droettboom > Science Software Branch > Operations and Engineering Division > Space Telescope Science Institute > Operated by AURA for NASA > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] <mailto:[email protected]> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > ------------------------------------------------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
