What you say is true, that (i.~x) u/.y is equivalent to x u/.y , because "underneath" it does the i.~x as necessary. There is not a noticeable hit in performance because the leftmost i.~ in i.~i.~x is fast.
----- Original Message ----- From: Rob Hodgkinson <[EMAIL PROTECTED]> Date: Friday, February 8, 2008 12:32 Subject: Re: [Jprogramming] Neophyte Performance Question - GROUPBY To: Programming forum <[email protected]> > Mike, something further to Devon's email... > > You do not need (i.~) in your expression... > > ... (i.~ xy) sum/. z > Becomes ... xy sum/. > z NB. Same result > > See Vocabulary page on oblique (u/.) ... eg > > 1 2 3 1 3 2 1 </. 'abcdefg' > +---+--+--+ > |adg|bf|ce| > +---+--+--+ > (i.~ 1 2 3 1 3 2 1) </. 'abcdefg' > +---+--+--+ > |adg|bf|ce| > +---+--+--+ > > But this has no noticeable impact on overall performance. > > Rob Hodgkinson > > > On 8/02/08 10:39 PM, "Mike Thompson" > <[EMAIL PROTECTED]> wrote: > > > > > I'm experimenting with 'groupby-like' operations across > columns of an > > 'inverted table'. > > > > NB. Inverted table has three columns: > x, y, z > > x =: 1 1 2 2 3 3 4 4 > > y =: 1 1 1 2 2 2 3 3 > > z =: 1 2 3 4 5 6 7 8 > > > > sum =: +/ > > > > xy =: x ,. > y NB. I want to groupby x > and y > > (('x' , ' ', 'y'); 'sum z') ,: (~. xy) ; > ,. (i.~ xy) sum/. z NB. > > Sum z for distinct x, y pairs > > > > Which yields this table: > > > > ----T-----┐ > > │x y│sum z│ > > +---+-----+ > > │1 1│ 3 │ > > │2 1│ 3 │ > > │2 2│ 4 │ > > │3 2│11 │ > > │4 3│15 │ > > L---+------ > > > > Thrilled that I can at least produce right answers, I now want > to improve > > the performance. > > > > So, I've been experiementing with: > > > > x =: ? 10000000 $ 999 > > y =: ? 10000000 $ 999 > > z =: ? 10000000 $ 999 > > > > Cutting away the formatting fluff from the table-forming > expression above, > > the CPU consuming core is this: > > > > (~. xy) ; (i.~ xy) > sum/. z > > > > Any suggestions on how to do this more efficiently > (faster)? For a start, > > I feel as if my > > approach must be calculating the nub of xy twice. Also, > perhaps xy, as I've > > created it, is a poor choice > > of structure to work with (I found the nub of x was massively > faster to > > calculate than the nub of xy): > > > > Ts '~. x' > > 0.21264 6784 > > > > Ts '~. xy' > > 9.13063 2.68436e8 > > > > > > Note: to simplify the explanation above, I used 'sum' > but actually I want > > to 'collect' partitions of z: > > > > collect =: <@, > > > > (i.~ xy) collect/. z > > > > ----T-T-T---T---┐ > > │1 2│3│4│5 6│7 8│ > > L---+-+-+---+---- > > > > Finally, I'm keen to have a generalised form of this groupby > available to > > me. Ie. Group by an arbitrary number of columns, not > just two. > > > > Many thanks for any insights, > > Mike ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
