On Fri, Jul 1, 2011 at 11:14 AM, Hackett, John (Norcross, GA) <john.hack...@unisourceworldwide.com> wrote: > After some experimentation (and judicious peeking at the source code), I > think I’ve got the hang of writing custom functions to pass into these > modules – basically, anything that accepts a list of values sliced from a > single column on the structured array and returns a single list seems to > work well. In functional programming terms, rec_summarize appears similar to > “map”, rec_groupby appears similar to “reduce”. > > > > Now – what if I want to derive a calculation from multiple statistics in the > original dataset – eg. create a new column on the array which is derived > from 2 (or up to n) other fields in a custom function which I pass into the > process? > > > > For example, conditional counts/summaries (count transactions and sum the > sales on all orders that weighed > 5K lbs). > > > > Is there a way to do this within numpy or mlab without going all the way out > to python and creating a list comprehension?
There are a couple of ways with the existing functions. One is to use a logical mask:: mask = r.weight>5 rg = mlab.rec_groupby(r[mask], groupby, stats) You could also create a new categorical variable with one or more values and attach it to your record array and then use rec_groupby:: heavy = np.where(r.weight>5, 1, 0) and add that to your record array r = mlab.rec_append_fields(r, ['heavy'], [heavy]) and then do a rec_group_by using 'heavy' as your group by attribute. Brian Schwartz has a preliminary implementation of rec_query which allows you to make a SQL query on a record array by converting it to a sqllite table, running the sql query, and returning the results as a new record array, which would solve your problem more cleanly and generically. The code needs a little more polishing, but perhaps Brian you can send over what you have in case John wants to take a look. JDH ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users