Re: [Numpy-discussion] numpy.mean problems

Eraldo Pomponi Tue, 13 Dec 2011 11:04:32 -0800

Hi Fred,

I would suggest you to have a look at pandas (http://pandas.sourceforge.net/)
. It was
really helpful for me. It seems well suited for the type of data that you
are working
with. It has nice "brodcasting" capabilities to apply numpy functions to a
set column.
http://pandas.sourceforge.net/basics.html#descriptive-statistics
http://pandas.sourceforge.net/basics.html#function-application


Cheers,
Eraldo


On Sun, Dec 11, 2011 at 1:49 PM, ferreirafm <ferreir...@lim12.fm.usp.br>wrote:

>
>
> Aronne Merrelli wrote:
> >
> > I can recreate this error if tab is a structured ndarray - what is the
> > dtype of tab?
> >
> > If that is correct, I think you could fix this by simplifying things.
> > Since
> > tab is already an ndarray, you should not need to convert it back into a
> > python list. By converting the ndarray back to a list you are making an
> > extra level of "wrapping" as a python object, which is ultimately why you
> > get that error about adding numpy.void.
> >
> > Unfortunately you cannot take directly take a mean of a struct dtype;
> > structs are generic so they could have fields with strings, or objects,
> > etc, that would be invalid for a mean calculation. However the following
> > code fragment should work pretty efficiently. It will make a 1-element
> > array of the same dtype as tab, and then populate it with the mean value
> > of
> > all elements where the length is >= 15. Note that dtype.fields.keys()
> > gives
> > you a nice way to iterate over the fields in the struct dtype:
> >
> > length_mask = tab['length'] >= 15
> > tab_means = np.zeros(1, dtype=tab.dtype)
> > for k in tab.dtype.fields.keys():
> >     tab_means[k] = np.mean( tab[k][mask] )
> >
> > In general this would not work if tab has a field that is not a simple
> > numeric type, such as a str, object, ... But it looks like your arrays
> are
> > all numeric from your example above.
> >
> > Hope that helps,
> > Aronne
> >
> HI Aronne,
> Thanks for your replay. Indeed, tab is a mix of different column types:
> tab.dtype:
> [('sgi', '<i8'), ('length', '<i8'), ('nident', '<i8'), ('pident', '<f8'),
> ('positive', '<i8'), ('ppos', '<f8'), ('mismatch', '<i8'), ('qstart',
> '<i8'), ('qend', '<i8'), ('sstart', '<i8'), ('send', '<i8'), ('gapopen',
> '<i8'), ('gaps', '<i8'), ('evalue', '<f8'), ('bitscore', '<f8'), ('score',
> '<f8')]
>  Interestingly, I couldn't be able to import some columns of digits as
> strings like as with R dataframe objects.
> I'll try to adapt your example to my needs and let you know the results.
> Regards.
>
> --
> View this message in context:
> http://old.nabble.com/numpy.mean-problems-tp32945124p32955052.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.mean problems

Reply via email to