Re: [Jprogramming] Neophyte Performance Question - GROUPBY

Roger Hui Fri, 08 Feb 2008 13:29:47 -0800

What you say is true, that (i.~x) u/.y is equivalent to x u/.y ,
because "underneath" it does the i.~x as necessary.
There is not a noticeable hit in performance because
the leftmost i.~ in i.~i.~x is fast.




----- Original Message -----
From: Rob Hodgkinson <[EMAIL PROTECTED]>
Date: Friday, February 8, 2008 12:32
Subject: Re: [Jprogramming] Neophyte Performance Question - GROUPBY
To: Programming forum <[email protected]>

> Mike, something further to Devon's email...
> 
> You do not need (i.~) in your expression...
> >    ...  (i.~ xy) sum/. z
> Becomes ...     xy sum/. 
> z       NB. Same result
> 
> See Vocabulary page on oblique (u/.) ... eg
> 
>    1 2 3 1 3 2 1 </. 'abcdefg'
> +---+--+--+
> |adg|bf|ce|
> +---+--+--+
>    (i.~ 1 2 3 1 3 2 1) </. 'abcdefg'
> +---+--+--+
> |adg|bf|ce|
> +---+--+--+
> 
> But this has no noticeable impact on overall performance.
> 
> Rob Hodgkinson
> 
> 
> On 8/02/08 10:39 PM, "Mike Thompson" 
> <[EMAIL PROTECTED]> wrote:
> 
> > 
> > I'm experimenting with 'groupby-like' operations across 
> columns of an
> > 'inverted table'.
> > 
> >    NB.  Inverted table has three columns: 
> x,  y,  z
> >    x =: 1 1 2 2 3 3 4 4
> >    y =: 1 1 1 2 2 2 3 3
> >    z =: 1 2 3 4 5 6 7 8
> > 
> >    sum =: +/
> > 
> >    xy =:  x ,. 
> y      NB.  I want to groupby x 
> and y
> >    (('x' , ' ', 'y'); 'sum z') ,: (~. xy) ; 
> ,.  (i.~ xy) sum/. z     NB.
> > Sum z for distinct x, y pairs
> > 
> > Which yields this table:
> > 
> > ----T-----┐
> > │x y│sum z│
> > +---+-----+
> > │1 1│ 3   │
> > │2 1│ 3   │
> > │2 2│ 4   │
> > │3 2│11   │
> > │4 3│15   │
> > L---+------
> > 
> > Thrilled that I can at least produce right answers, I now want 
> to improve
> > the performance.
> > 
> > So, I've been experiementing with:
> > 
> >     x =: ? 10000000 $ 999
> >     y =: ? 10000000 $ 999
> >     z =: ? 10000000 $ 999
> > 
> > Cutting away the formatting fluff from the table-forming 
> expression above,
> > the CPU consuming core is this:
> >      
> >       (~. xy) ;  (i.~ xy) 
> sum/. z
> > 
> > Any suggestions on how to do this more efficiently 
> (faster)?   For a start,
> > I feel as if my 
> > approach must be calculating the nub of xy twice.  Also, 
> perhaps xy, as I've
> > created it, is a poor choice
> > of structure to work with (I found the nub of x was massively 
> faster to
> > calculate than the nub of xy):
> > 
> >      Ts '~. x'
> > 0.21264 6784
> > 
> >    Ts '~. xy' 
> > 9.13063 2.68436e8
> > 
> > 
> > Note:  to simplify the explanation above, I used 'sum' 
> but actually I want
> > to 'collect' partitions of z:
> > 
> > collect =: <@,  
> > 
> > (i.~ xy) collect/. z
> > 
> > ----T-T-T---T---┐
> > │1 2│3│4│5 6│7 8│
> > L---+-+-+---+----
> > 
> > Finally, I'm keen to have a generalised form of this groupby 
> available to
> > me.  Ie. Group by an arbitrary number of columns, not 
> just two.
> > 
> > Many thanks for any insights,
> > Mike
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Neophyte Performance Question - GROUPBY

Reply via email to