Re: [Matplotlib-users] Mlab - Rec_Summarize / Rec_GroupBy

John Hunter Fri, 01 Jul 2011 11:27:01 -0700

On Fri, Jul 1, 2011 at 11:14 AM, Hackett, John (Norcross, GA)
<john.hack...@unisourceworldwide.com> wrote:
> After some experimentation (and judicious peeking at the source code), I
> think I’ve got the hang of writing custom functions to pass into these
> modules – basically, anything that accepts a list of values sliced from a
> single column on the structured array and returns a single list seems to
> work well. In functional programming terms, rec_summarize appears similar to
> “map”, rec_groupby appears similar to “reduce”.
>
>
>
> Now – what if I want to derive a calculation from multiple statistics in the
> original dataset – eg. create a new column on the array which is derived
> from 2 (or up to n) other fields in a custom function which I pass into the
> process?
>
>
>
> For example, conditional counts/summaries (count transactions and sum the
> sales on all orders that weighed > 5K lbs).
>
>
>
> Is there a way to do this within numpy or mlab without going all the way out
> to python and creating a list comprehension?


There are a couple of ways with the existing functions.

One is to use a logical mask::

   mask = r.weight>5
   rg = mlab.rec_groupby(r[mask], groupby, stats)

You could also create a new categorical variable with one or more
values and attach it to your record array and then use rec_groupby::

  heavy = np.where(r.weight>5, 1, 0)

and add that to your record array

  r = mlab.rec_append_fields(r, ['heavy'], [heavy])

and then do a rec_group_by using 'heavy' as your group by attribute.

Brian Schwartz has a preliminary implementation of rec_query which
allows you to make a SQL query on a record array by converting it to a
sqllite table, running the sql query, and returning the results as a
new record array, which would solve your problem more cleanly and
generically.  The code needs a little more polishing, but perhaps
Brian you can send over what you have in case John wants to take a
look.

JDH

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Re: [Matplotlib-users] Mlab - Rec_Summarize / Rec_GroupBy

Reply via email to