[Numpy-discussion] Automatic string length in recarray

2009-11-02 Thread Thomas Robitaille
Hi,

I'm having trouble with creating np.string_ fields in recarrays. If I  
create a recarray using

np.rec.fromrecords([(1,'hello'),(2,'world')],names=['a','b'])

the result looks fine:

rec.array([(1, 'hello'), (2, 'world')], dtype=[('a', 'http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic string length in recarray

2009-11-03 Thread David Warde-Farley
On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote:

> But if I want to specify the data types:
>
> np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),
> ('b',np.str)])
>
> the string field is set to a length of zero:
>
> rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')])
>
> I need to specify datatypes for all numerical types since I care about
> int8/16/32, etc, but I would like to benefit from the auto string
> length detection that works if I don't specify datatypes. I tried
> replacing np.str by None but no luck. I know I can specify '|S5' for
> example, but I don't know in advance what the string length should be
> set to.

This is a limitation of the way the dtype code works, and AFAIK  
there's no easy fix. In some code I wrote recently I had to loop  
through the entire list of records i.e. max(len(foo[2]) for foo in  
records).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic string length in recarray

2009-11-03 Thread Pierre GM

On Nov 3, 2009, at 11:43 AM, David Warde-Farley wrote:

> On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote:
>
>> But if I want to specify the data types:
>>
>> np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),
>> ('b',np.str)])
>>
>> the string field is set to a length of zero:
>>
>> rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')])
>>
>> I need to specify datatypes for all numerical types since I care  
>> about
>> int8/16/32, etc, but I would like to benefit from the auto string
>> length detection that works if I don't specify datatypes. I tried
>> replacing np.str by None but no luck. I know I can specify '|S5' for
>> example, but I don't know in advance what the string length should be
>> set to.
>
> This is a limitation of the way the dtype code works, and AFAIK
> there's no easy fix. In some code I wrote recently I had to loop
> through the entire list of records i.e. max(len(foo[2]) for foo in
> records).

As a workwaround, perhaps you could use np.object instead of np.str  
while defining your array. You can then get the maximum string length  
by looping, as David suggested, and then use .astype to transform your  
array...

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic string length in recarray

2009-11-04 Thread Thomas Robitaille


Pierre GM-2 wrote:
> 
> As a workwaround, perhaps you could use np.object instead of np.str  
> while defining your array. You can then get the maximum string length  
> by looping, as David suggested, and then use .astype to transform your  
> array...
> 

I tried this:

np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),('b',np.object_)])

but I get a TypeError:

---
TypeError Traceback (most recent call last)

/Users/tom/ in ()

/Users/tom/Library/Python/2.6/site-packages/numpy/core/records.pyc in
fromrecords(recList, dtype, shape, formats, names, titles, aligned,
byteorder)
625 res = retval.view(recarray)
626 
--> 627 res.dtype = sb.dtype((record, res.dtype))
628 return res
629 

/Users/tom/Library/Python/2.6/site-packages/numpy/core/records.pyc in
__setattr__(self, attr, val)
432 if attr not in fielddict:
433 exctype, value = sys.exc_info()[:2]
--> 434 raise exctype, value
435 else:
436 fielddict =
ndarray.__getattribute__(self,'dtype').fields or {}

TypeError: Cannot change data-type for object array.

Is this a bug?

Thanks,

Thomas
-- 
View this message in context: 
http://old.nabble.com/Automatic-string-length-in-recarray-tp26174810p26199762.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic string length in recarray

2009-11-04 Thread Dan Yamins
On Tue, Nov 3, 2009 at 11:43 AM, David Warde-Farley wrote:

> On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote:
>
> > But if I want to specify the data types:
> >
> > np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),
> > ('b',np.str)])
> >
> > the string field is set to a length of zero:
> >
> > rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')])
> >
> > I need to specify datatypes for all numerical types since I care about
> > int8/16/32, etc, but I would like to benefit from the auto string
> > length detection that works if I don't specify datatypes. I tried
> > replacing np.str by None but no luck. I know I can specify '|S5' for
> > example, but I don't know in advance what the string length should be
> > set to.
>
> This is a limitation of the way the dtype code works, and AFAIK
> there's no easy fix. In some code I wrote recently I had to loop
> through the entire list of records i.e. max(len(foo[2]) for foo in
> records).
>
>
Not to shamelessly plug my own project ... but more robust string type
detection is one of the features  of Tabular (
http://bitbucket.org/elaine/tabular/), and is one of the (kinds of) reasons
we wrote the package.  Perhaps using Tabular could be useful to you?

Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic string length in recarray

2009-11-04 Thread Pierre GM

On Nov 4, 2009, at 11:35 AM, Thomas Robitaille wrote:

>
>
> Pierre GM-2 wrote:
>>
>> As a workwaround, perhaps you could use np.object instead of np.str
>> while defining your array. You can then get the maximum string length
>> by looping, as David suggested, and then use .astype to transform  
>> your
>> array...
>>
>
> I tried this:
>
> np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8), 
> ('b',np.object_)])
>
> but I get a TypeError:

Confirmed, it's a bug all right. Would you mind opening a ticket ?  
I'll try to take care of that in the next few days.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic string length in recarray

2009-11-04 Thread Thomas Robitaille


Pierre GM-2 wrote:
> 
> Confirmed, it's a bug all right. Would you mind opening a ticket ?  
> I'll try to take care of that in the next few days.
> 

Done - http://projects.scipy.org/numpy/ticket/1283

Thanks!

Thomas

-- 
View this message in context: 
http://old.nabble.com/Automatic-string-length-in-recarray-tp26174810p26203110.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion