Hi,

I've been mulling over a couple of ideas for new ufunc methods plus a  
couple of numpy functions that I think will help implement group-by  
operations with NumPy arrays.

I wanted to discuss them on this list before putting forward an actual  
proposal or patch to get input from others.

The group-by operation is very common in relational algebra and NumPy  
arrays (especially structured arrays) can often be seen as a database  
table.    There are common and easy-to implement approaches for select  
and other relational algebra concepts, but group-by basically has to  
be implemented yourself.

Here are my suggested additions to NumPy:

ufunc methods:
        * reduceby (array, by, sorted=1, axis=0)

              array is the array to reduce
             by is the array to provide the grouping (can be a structured  
array or a list of arrays)

              if sorted is 1, then possibly a faster algorithm can be  
used.
        
        * reducein (array, indices, axis=0)

               similar to reduce-at, but the indices provide both the  
start and end points (rather than being fence-posts like reduceat).

numpy functions (or methods):

         * segment(array)

           (produce an array of integers from an array producing the  
different "regions" of an array:

            segment([10,20,10,20,30,30,10])  would produce ([0,1,0,1,2,2,0])


         * edges(array, at=True)
        
           produce an index array providing the edges (with either fence-post  
like syntax for reduce-at or both boundaries like reducein.


Thoughts?

-Travis






Thoughts on the general idea?


--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
oliph...@enthought.com





_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to