Re: [Numpy-discussion] numpy.mean problems
Hi Eraldo, Indeed, Pandas is a really really nice module! If it is going to take part of numpy, that's even better. Thanks for the suggestion. All the Best, Fred Eraldo Pomponi wrote: Hi Fred, Pandas has a nice interface to PyTable if you still need it: http://pandas.sourceforge.net/io.html#hdf5-pytables However, my intention was just to point you to pandas because it is really a powerful tool if you need to deal with tabular heterogenic data. It is also important to notice that there are plans in the numpy community to include/port part of this package directly in the codebase. This says a lot about how good it is... Best, Eraldo -- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32975342.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean problems
Hi Eraldo, Thanks for your suggestion. I was using pytables but give up after known that some very useful capabilities are sold as a professional package. However, it still useful to many printing and data manipulation and, also, it can handle extremely large datasets (which is not my case.). Regards, Fred Eraldo Pomponi wrote: I would suggest you to have a look at pandas (http://pandas.sourceforge.net/) . It was really helpful for me. It seems well suited for the type of data that you are working with. It has nice brodcasting capabilities to apply numpy functions to a set column. http://pandas.sourceforge.net/basics.html#descriptive-statistics http://pandas.sourceforge.net/basics.html#function-application Cheers, Eraldo -- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32970295.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean problems
Aronne Merrelli wrote: I can recreate this error if tab is a structured ndarray - what is the dtype of tab? If that is correct, I think you could fix this by simplifying things. Since tab is already an ndarray, you should not need to convert it back into a python list. By converting the ndarray back to a list you are making an extra level of wrapping as a python object, which is ultimately why you get that error about adding numpy.void. Unfortunately you cannot take directly take a mean of a struct dtype; structs are generic so they could have fields with strings, or objects, etc, that would be invalid for a mean calculation. However the following code fragment should work pretty efficiently. It will make a 1-element array of the same dtype as tab, and then populate it with the mean value of all elements where the length is = 15. Note that dtype.fields.keys() gives you a nice way to iterate over the fields in the struct dtype: length_mask = tab['length'] = 15 tab_means = np.zeros(1, dtype=tab.dtype) for k in tab.dtype.fields.keys(): tab_means[k] = np.mean( tab[k][mask] ) In general this would not work if tab has a field that is not a simple numeric type, such as a str, object, ... But it looks like your arrays are all numeric from your example above. Hope that helps, Aronne HI Aronne, Thanks for your replay. Indeed, tab is a mix of different column types: tab.dtype: [('sgi', 'i8'), ('length', 'i8'), ('nident', 'i8'), ('pident', 'f8'), ('positive', 'i8'), ('ppos', 'f8'), ('mismatch', 'i8'), ('qstart', 'i8'), ('qend', 'i8'), ('sstart', 'i8'), ('send', 'i8'), ('gapopen', 'i8'), ('gaps', 'i8'), ('evalue', 'f8'), ('bitscore', 'f8'), ('score', 'f8')] Interestingly, I couldn't be able to import some columns of digits as strings like as with R dataframe objects. I'll try to adapt your example to my needs and let you know the results. Regards. -- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32955052.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy.mean problems
Hi everyone, I'm quite new to numpy and python either. Could someone, please, tell me what I'm doing wrong? Here goes my peace of code: def stats(filename): Utilility to perform some basic statistics on columns. tab = get_textab(filename) stat_list = [ ] for row in sort_tab(tab): if row['length'] = 15: stat_list.append(row) stat_array = np.array(stat_list) print type(sort_tab(tab)) print type(stat_array) #print stat_array.mean(axis=0) print np.mean(stat_array, axis=0) Which results in: type 'numpy.ndarray' type 'numpy.ndarray' Traceback (most recent call last): File /home/ferreirafm/bin/cross.py, line 213, in module main() File /home/ferreirafm/bin/cross.py, line 204, in main stats(filename) File /home/ferreirafm/bin/cross.py, line 146, in stats print np.mean(stat_array, axis=0) File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line 2374, in mean return mean(axis, dtype, out) TypeError: unsupported operand type(s) for +: 'numpy.void' and 'numpy.void' -- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32945124.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion