Hi, I've been mulling over a couple of ideas for new ufunc methods plus a couple of numpy functions that I think will help implement group-by operations with NumPy arrays.
I wanted to discuss them on this list before putting forward an actual proposal or patch to get input from others. The group-by operation is very common in relational algebra and NumPy arrays (especially structured arrays) can often be seen as a database table. There are common and easy-to implement approaches for select and other relational algebra concepts, but group-by basically has to be implemented yourself. Here are my suggested additions to NumPy: ufunc methods: * reduceby (array, by, sorted=1, axis=0) array is the array to reduce by is the array to provide the grouping (can be a structured array or a list of arrays) if sorted is 1, then possibly a faster algorithm can be used. * reducein (array, indices, axis=0) similar to reduce-at, but the indices provide both the start and end points (rather than being fence-posts like reduceat). numpy functions (or methods): * segment(array) (produce an array of integers from an array producing the different "regions" of an array: segment([10,20,10,20,30,30,10]) would produce ([0,1,0,1,2,2,0]) * edges(array, at=True) produce an index array providing the edges (with either fence-post like syntax for reduce-at or both boundaries like reducein. Thoughts? -Travis Thoughts on the general idea? -- Travis Oliphant Enthought Inc. 1-512-536-1057 http://www.enthought.com oliph...@enthought.com _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion