[Numpy-discussion] Automatic string length in recarray
Hi, I'm having trouble with creating np.string_ fields in recarrays. If I create a recarray using np.rec.fromrecords([(1,'hello'),(2,'world')],names=['a','b']) the result looks fine: rec.array([(1, 'hello'), (2, 'world')], dtype=[('a', 'http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic string length in recarray
On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote: > But if I want to specify the data types: > > np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8), > ('b',np.str)]) > > the string field is set to a length of zero: > > rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')]) > > I need to specify datatypes for all numerical types since I care about > int8/16/32, etc, but I would like to benefit from the auto string > length detection that works if I don't specify datatypes. I tried > replacing np.str by None but no luck. I know I can specify '|S5' for > example, but I don't know in advance what the string length should be > set to. This is a limitation of the way the dtype code works, and AFAIK there's no easy fix. In some code I wrote recently I had to loop through the entire list of records i.e. max(len(foo[2]) for foo in records). David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic string length in recarray
On Nov 3, 2009, at 11:43 AM, David Warde-Farley wrote: > On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote: > >> But if I want to specify the data types: >> >> np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8), >> ('b',np.str)]) >> >> the string field is set to a length of zero: >> >> rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')]) >> >> I need to specify datatypes for all numerical types since I care >> about >> int8/16/32, etc, but I would like to benefit from the auto string >> length detection that works if I don't specify datatypes. I tried >> replacing np.str by None but no luck. I know I can specify '|S5' for >> example, but I don't know in advance what the string length should be >> set to. > > This is a limitation of the way the dtype code works, and AFAIK > there's no easy fix. In some code I wrote recently I had to loop > through the entire list of records i.e. max(len(foo[2]) for foo in > records). As a workwaround, perhaps you could use np.object instead of np.str while defining your array. You can then get the maximum string length by looping, as David suggested, and then use .astype to transform your array... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic string length in recarray
Pierre GM-2 wrote: > > As a workwaround, perhaps you could use np.object instead of np.str > while defining your array. You can then get the maximum string length > by looping, as David suggested, and then use .astype to transform your > array... > I tried this: np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),('b',np.object_)]) but I get a TypeError: --- TypeError Traceback (most recent call last) /Users/tom/ in () /Users/tom/Library/Python/2.6/site-packages/numpy/core/records.pyc in fromrecords(recList, dtype, shape, formats, names, titles, aligned, byteorder) 625 res = retval.view(recarray) 626 --> 627 res.dtype = sb.dtype((record, res.dtype)) 628 return res 629 /Users/tom/Library/Python/2.6/site-packages/numpy/core/records.pyc in __setattr__(self, attr, val) 432 if attr not in fielddict: 433 exctype, value = sys.exc_info()[:2] --> 434 raise exctype, value 435 else: 436 fielddict = ndarray.__getattribute__(self,'dtype').fields or {} TypeError: Cannot change data-type for object array. Is this a bug? Thanks, Thomas -- View this message in context: http://old.nabble.com/Automatic-string-length-in-recarray-tp26174810p26199762.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic string length in recarray
On Tue, Nov 3, 2009 at 11:43 AM, David Warde-Farley wrote: > On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote: > > > But if I want to specify the data types: > > > > np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8), > > ('b',np.str)]) > > > > the string field is set to a length of zero: > > > > rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')]) > > > > I need to specify datatypes for all numerical types since I care about > > int8/16/32, etc, but I would like to benefit from the auto string > > length detection that works if I don't specify datatypes. I tried > > replacing np.str by None but no luck. I know I can specify '|S5' for > > example, but I don't know in advance what the string length should be > > set to. > > This is a limitation of the way the dtype code works, and AFAIK > there's no easy fix. In some code I wrote recently I had to loop > through the entire list of records i.e. max(len(foo[2]) for foo in > records). > > Not to shamelessly plug my own project ... but more robust string type detection is one of the features of Tabular ( http://bitbucket.org/elaine/tabular/), and is one of the (kinds of) reasons we wrote the package. Perhaps using Tabular could be useful to you? Dan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic string length in recarray
On Nov 4, 2009, at 11:35 AM, Thomas Robitaille wrote: > > > Pierre GM-2 wrote: >> >> As a workwaround, perhaps you could use np.object instead of np.str >> while defining your array. You can then get the maximum string length >> by looping, as David suggested, and then use .astype to transform >> your >> array... >> > > I tried this: > > np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8), > ('b',np.object_)]) > > but I get a TypeError: Confirmed, it's a bug all right. Would you mind opening a ticket ? I'll try to take care of that in the next few days. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic string length in recarray
Pierre GM-2 wrote: > > Confirmed, it's a bug all right. Would you mind opening a ticket ? > I'll try to take care of that in the next few days. > Done - http://projects.scipy.org/numpy/ticket/1283 Thanks! Thomas -- View this message in context: http://old.nabble.com/Automatic-string-length-in-recarray-tp26174810p26203110.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion