Hi, I'm writing to report what looks like a two bugs in the handling of strings of length 0. (I'm using 1.4.0.dev7746, on Mac OSX 10.5.8. The problems below occur both for python 2.5 compiled 32-bit as well as python2.6 compiled 64-bit).
Bug #1: A problem arises when you try to create a record array passing a type of '|S0'. >>> Cols = [['test']*10,['']*10] When not passing any dtype, this is created into a recarray with no problem: >>> np.rec.fromarrays(Cols) rec.array([('test', ''), ('test', ''), ('test', ''), ('test', ''), ('test', ''), ('test', ''), ('test', ''), ('test', ''), ('test', ''), ('test', '')], dtype=[('f0', '|S4'), ('f1', '|S1')]) However, trouble arises when I try to pass a length-0 dtype explicitly. >>> d = np.dtype([('A', '|S4'), ('B', '|S')]) >>> np.rec.fromarrays(Cols,dtype=d) rec.array([('test', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', '')], dtype=[('A', '|S4'), ('B', '|S0')]) The same thing occurs if I cast to np arrays before passing to np.rec.fromarrays: >>> _arrays = [np.array(Cols[0],'|S4'),np.array(Cols[1],'|S')] [array(['test', 'test', 'test', 'test', 'test', 'test', 'test', 'test', 'test', 'test'], dtype='|S4'), array(['', '', '', '', '', '', '', '', '', ''], dtype='|S1')] >>> np.rec.fromarrays(_arrays,dtype=d) rec.array([('test', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', '')], dtype=[('A', '|S4'), ('B', '|S0')]) (Btw, why does np.array(['',''],'|S')) return an array with dtype '|S1'? Why not '|S0'? Are length-0 arrays being avoided explicitly? If so, why?) Bug #2: I'm not sure this is a bug, but it is annoying: np.dtype won't accept '|S0' as a type argument. >>> np.dtype('|S0') TypeError: data type not understood I have to do something like this: >>> d = np.dtype('|S') >>> d dtype('|S0') to get what I want. Is this intended? Regardless, this inconsistency also means that things like: >>> np.dtype(d.descr) can fail even when d is a properly constructed dtype object with a '|S0' type, which seems a little perverse. Am I just not supposed to be working with length-0 string columns, period? Thanks, Dan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion