Thanks.  I'm running into this because the items in my large lists are
complex data structures that have nested boxing.

I ended up using the following as a replacement for ~:
   ns =: (] (/: /:)~ 1: , 2: ([: -. -:)/\ /:~)
Which seems to give the same results as ~: at least in the context of my
application.

'ns' can be much faster than ~: for large lists of deeply boxed data:
   6!:2 '+/ ~: <"0 <"2 (?. 100000 4 4 $ 2)'
1106.85
   6!:2 '+/ ns <"0 <"2 (?. 100000 4 4 $ 2)'
2.08842

Even though it is slower on simpler unboxed data:
   6!:2 '+/ ~: (?. 100000 4 4 $ 2)'
0.0196997
   6!:2 '+/ ns (?. 100000 4 4 $ 2)'
0.0867839

I don't think that 'ns' is an exact drop-in replacement for ~: though.
For example, with arrays of floating-point values that are close enough to
be on the edge of tolerance, I imagine 'ns' could give different results
than ~:

-Chris





On Wed, Jun 26, 2013 at 6:31 PM, Tracy Harms <[email protected]> wrote:

> In such cases it may be worthwhile to make a keying list with items that
> correspond to those of the list for which you want to compute the nub. If
> you calculate simple unique values for each item you may rely on the
> correspondence as needed.
> On Jun 25, 2013 11:29 PM, "Christopher Rosin" <[email protected]> wrote:
>
> > I was having a performance problem that I traced to nub applied to boxed
> > arrays.
> >
> > Nub sieve ~: gives the same results here whether the items are unboxed,
> > boxed, or doubly boxed:
> >    +/ ~: (?. 10000 4 4 $ 2)
> > 9255
> >    +/ ~: <"2 (?. 10000 4 4 $ 2)
> > 9255
> >    +/ ~: <"0 <"2 (?. 10000 4 4 $ 2)
> > 9255
> >
> > But the runtime is very different in the doubly boxed case:
> >    6!:2 '+/ ~: (?. 10000 4 4 $ 2)'
> > 0.00105408
> >    6!:2 '+/ ~: <"2 (?. 10000 4 4 $ 2)'
> > 0.00585098
> >    6!:2 '+/ ~: <"0 <"2 (?. 10000 4 4 $ 2)'
> > 14.9312
> >
> > Boxing the items only once, performance appears close to linear:
> >    6!:2 '+/ ~: <"2 (?. 1000 4 4 $ 2)'
> > 0.000527954
> >    6!:2 '+/ ~: <"2 (?. 10000 4 4 $ 2)'
> > 0.00488113
> >    6!:2 '+/ ~: <"2 (?. 100000 4 4 $ 2)'
> > 0.075351
> >
> > But doubly-boxed, performance seems to become nearly quadratic:
> >    6!:2 '+/ ~: <"0 <"2 (?. 1000 4 4 $ 2)'
> > 0.162159
> >    6!:2 '+/ ~: <"0 <"2 (?. 10000 4 4 $ 2)'
> > 14.9312
> >    6!:2 '+/ ~: <"0 <"2 (?. 100000 4 4 $ 2)'
> > 1106.85
> >
> > Timing is similar with nub instead of nub sieve.
> >
> > Is there any J documentation that explains the performance of nub in
> > various scenarios?  I haven't been able to find any.
> >
> > Thanks.
> > -Chris
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to