I was wrong, it should have been
(x i. y) -: ((i.~.)x){~(~.x) i. y
It still is a lot more efficient if [EMAIL PROTECTED] is much smaller than # .
R.E. Boss
> -----Oorspronkelijk bericht-----
> Van: [EMAIL PROTECTED] [mailto:programming-
> [EMAIL PROTECTED] Namens Roger Hui
> Verzonden: vrijdag 12 oktober 2007 17:13
> Aan: Programming forum
> Onderwerp: Re: [Jprogramming] Performance of case-insensitive lookup
>
> > But then, in those cases, you should use the nub anyhow since
> > (x i. y) -: (~.x) i. y
>
> That is not true in general:
>
> x=: 20 [EMAIL PROTECTED] 10
> y=: 5 [EMAIL PROTECTED] 10
> x i. y
> 20 1 9 6 4
> (~.x) i. y
> 9 1 6 5 4
>
> The "those cases" for which the statement is true, must be
> that x-:~.x or that y has 0 items or some more complicated
> condition on x and y (all items of y occur in a prefix of x
> that has all unique items).
>
>
>
> ----- Original Message -----
> From: "R.E. Boss" <[EMAIL PROTECTED]>
> Date: Friday, October 12, 2007 3:04
> Subject: RE: [Jprogramming] Performance of case-insensitive lookup
> To: 'Programming forum' <[email protected]>
>
> > The adverb I use for these cases, where [EMAIL PROTECTED] is much less than
> > #
> > , is
> >
> > fst=:1 : '](i.!.0~ { u @:]) ~.'
> >
> > 5 ts'tolower &.> ppl'
> > 0.018118821 72128
> > 5 ts'tolower &.>fst ppl'
> > 0.00065326459 19328
> >
> > (tolower &.>fst -: tolower &.>) ppl
> > 1
> >
> > It is about 28 times as fast and almost 4 times as lean, which
> > gives a
> > relative performance of more than 100. which is still inferior
> > to yours.
> >
> > But then, in those cases, you should use the nub anyhow since
> >
> > (x i. y) -: (~.x) i. y
> >
> > ts'ppl i.&:(tolower &.>) p'
> > 0.017758768 73856
> > ts'(~.ppl) i.&:(tolower &.>) p'
> > 0.00043554953 9856
> >
> > Which gives a relative performance improvement of a factor 300
> >
> >
> > R.E. Boss
> >
> >
> >
> >
> > > -----Oorspronkelijk bericht-----
> > > Van: [EMAIL PROTECTED] [mailto:programming-
> > > [EMAIL PROTECTED] Namens Sherlock, Ric
> > > Verzonden: vrijdag 12 oktober 2007 11:20
> > > Aan: Programming forum
> > > Onderwerp: [Jprogramming] Performance of case-insensitive lookup
> > >
> > > I was doing a case-insensitive lookup of firstname and
> > lastname in a
> > > 2-column boxed table.
> > > fnames=: <;._1 ' John Dakota Wilson Diana
> > Joan Roberto John John'
> > > lnames=: <;._1 ' Smith Jones Chan Wilson
> > Saxon Angelo Smith Wilson'
> > > ]ppl=:500 $ fnames,.lnames
> > > +-------+------+
> > > |John |Smith |
> > > +-------+------+
> > > |Dakota |Jones |
> > > +-------+------+
> > > |Wilson |Chan |
> > > +-------+------+
> > > |Diana |Wilson|
> > > +-------+------+
> > > |Joan |Saxon |
> > > +-------+------+
> > > |Roberto|Angelo|
> > > +-------+------+
> > > |John |Smith |
> > > +-------+------+
> > > |John |Wilson|
> > > ...
> > >
> > > p=: 'Joan';'Saxon'
> > > p2=:'JOAN';'saxon'
> > > ppl i. p
> > > 4
> > > (tolower each ppl) i. tolower each p2
> > > 4
> > >
> > > However performance wasn't great, which I tracked it down to
> > having to
> > > run the verb tolower so many times. Below I've documented a
> > solution to
> > > this performance problem using inverted tables, but would be
> > interested> in other possible ways of bypassing the performance
> > hit caused by making
> > > the lookup case-insensitive.
> > >
> > > A solution using inverted tables.
> > > (Load collected definitions from
> > > http://www.jsoftware.com/jwiki/Essays/Inverted_Table )
> > >
> > > mfv=: ,:^:(#&$ = 1:) NB. Create 1 row matrix
> > from vector
> > > pplinv=: ifa ppl
> > > pinv =: ifa mfv p
> > > p2inv=: ifa mfv p2
> > > pplinv tindexof pinv
> > > length error
> > >
> > > The problem is that converting ppl to an inverted table
> > extends each
> > > name to the length of the longest name. For pinv to match, its names
> > > also need to be extended to that same width.
> > > How can this best be done?
> > >
> > > My solutions as follows:
> > > textend=: {:@$&.>@[ {."1&.> ]
> > > pplinv textend pinv
> > > +-------+------+
> > > |Joan |Saxon |
> > > +-------+------+
> > >
> > > pplinv tindexof pplinv textend pinv
> > > 4
> > >
> > > Or more directly:
> > >
> > > tindexof1=: [ tindexof {:@$&.>@[ {."1&.> ]
> > > pplinv tindexof1 pinv
> > > 4
> > > (tolower each pplinv) tindexof1 tolower each p2inv
> > > 4
> > >
> > > ts=: 6!:2 , 7!:[EMAIL PROTECTED]
> > > ts '(tolower each ppl) i. tolower each p2'
> > > 0.0206076470613 147456
> > > ts '(tolower each ppl2inv) tindexof1 tolower
> > each p2inv'
> > > 0.000427987355935 48000
> > >
> > > About 48 times faster and 3 time leaner using inverted tables.
> > >
> > > Even for a single lookup the overhead of converting to
> > inverted tables
> > > is worthwhile:
> > > ts '(tolower each ifa ppl) tindexof1 tolower
> > each ifa mfv p2'
> > > 0.000516546097339 57216
> > >
> > > About 40 times faster and 2.5 times leaner.
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm