Hi, I'm writing to report what looks like a two bugs in the handling of
strings of length 0.  (I'm using 1.4.0.dev7746, on Mac OSX 10.5.8.   The
problems below occur both for python 2.5 compiled 32-bit as well as
python2.6 compiled 64-bit).


Bug #1:
A problem arises when you try to create a record array passing a type of
'|S0'.

>>> Cols = [['test']*10,['']*10]

When not passing any dtype, this is created into a recarray with no problem:

>>> np.rec.fromarrays(Cols)
rec.array([('test', ''), ('test', ''), ('test', ''), ('test', ''),
       ('test', ''), ('test', ''), ('test', ''), ('test', ''),
       ('test', ''), ('test', '')],
      dtype=[('f0', '|S4'), ('f1', '|S1')])

However, trouble arises when I try to pass a length-0 dtype explicitly.

>>> d = np.dtype([('A', '|S4'), ('B', '|S')])
>>> np.rec.fromarrays(Cols,dtype=d)
rec.array([('test', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
       ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
       ('\x00est', ''), ('\x00est', '')],
      dtype=[('A', '|S4'), ('B', '|S0')])

The same thing occurs if I cast to np arrays before passing to
np.rec.fromarrays:

>>> _arrays = [np.array(Cols[0],'|S4'),np.array(Cols[1],'|S')]
[array(['test', 'test', 'test', 'test', 'test', 'test', 'test', 'test',
       'test', 'test'],
      dtype='|S4'),
 array(['', '', '', '', '', '', '', '', '', ''],
      dtype='|S1')]

>>> np.rec.fromarrays(_arrays,dtype=d)
rec.array([('test', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
       ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
       ('\x00est', ''), ('\x00est', '')],
      dtype=[('A', '|S4'), ('B', '|S0')])

(Btw, why does np.array(['',''],'|S')) return an array with dtype '|S1'?
Why not '|S0'?  Are length-0 arrays being avoided explicitly? If so, why?)


Bug #2:  I'm not sure this is a bug, but it is annoying: np.dtype won't
accept '|S0' as a type argument.

>>> np.dtype('|S0')
TypeError: data type not understood

I have to do something like this:

>>> d = np.dtype('|S')
>>> d
dtype('|S0')

to get what I want.   Is this intended?  Regardless, this inconsistency also
means that things like:

>>> np.dtype(d.descr)

can fail even when d is a properly constructed dtype object with a '|S0'
type, which seems a little perverse.


Am I just not supposed to be working with length-0 string columns, period?



Thanks,
Dan
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to