> But then, in those cases, you should use the nub anyhow since > (x i. y) -: (~.x) i. y
That is not true in general: x=: 20 [EMAIL PROTECTED] 10 y=: 5 [EMAIL PROTECTED] 10 x i. y 20 1 9 6 4 (~.x) i. y 9 1 6 5 4 The "those cases" for which the statement is true, must be that x-:~.x or that y has 0 items or some more complicated condition on x and y (all items of y occur in a prefix of x that has all unique items). ----- Original Message ----- From: "R.E. Boss" <[EMAIL PROTECTED]> Date: Friday, October 12, 2007 3:04 Subject: RE: [Jprogramming] Performance of case-insensitive lookup To: 'Programming forum' <[email protected]> > The adverb I use for these cases, where [EMAIL PROTECTED] is much less than # > , is > > fst=:1 : '](i.!.0~ { u @:]) ~.' > > 5 ts'tolower &.> ppl' > 0.018118821 72128 > 5 ts'tolower &.>fst ppl' > 0.00065326459 19328 > > (tolower &.>fst -: tolower &.>) ppl > 1 > > It is about 28 times as fast and almost 4 times as lean, which > gives a > relative performance of more than 100. which is still inferior > to yours. > > But then, in those cases, you should use the nub anyhow since > > (x i. y) -: (~.x) i. y > > ts'ppl i.&:(tolower &.>) p' > 0.017758768 73856 > ts'(~.ppl) i.&:(tolower &.>) p' > 0.00043554953 9856 > > Which gives a relative performance improvement of a factor 300 > > > R.E. Boss > > > > > > -----Oorspronkelijk bericht----- > > Van: [EMAIL PROTECTED] [mailto:programming- > > [EMAIL PROTECTED] Namens Sherlock, Ric > > Verzonden: vrijdag 12 oktober 2007 11:20 > > Aan: Programming forum > > Onderwerp: [Jprogramming] Performance of case-insensitive lookup > > > > I was doing a case-insensitive lookup of firstname and > lastname in a > > 2-column boxed table. > > fnames=: <;._1 ' John Dakota Wilson Diana > Joan Roberto John John' > > lnames=: <;._1 ' Smith Jones Chan Wilson > Saxon Angelo Smith Wilson' > > ]ppl=:500 $ fnames,.lnames > > +-------+------+ > > |John |Smith | > > +-------+------+ > > |Dakota |Jones | > > +-------+------+ > > |Wilson |Chan | > > +-------+------+ > > |Diana |Wilson| > > +-------+------+ > > |Joan |Saxon | > > +-------+------+ > > |Roberto|Angelo| > > +-------+------+ > > |John |Smith | > > +-------+------+ > > |John |Wilson| > > ... > > > > p=: 'Joan';'Saxon' > > p2=:'JOAN';'saxon' > > ppl i. p > > 4 > > (tolower each ppl) i. tolower each p2 > > 4 > > > > However performance wasn't great, which I tracked it down to > having to > > run the verb tolower so many times. Below I've documented a > solution to > > this performance problem using inverted tables, but would be > interested> in other possible ways of bypassing the performance > hit caused by making > > the lookup case-insensitive. > > > > A solution using inverted tables. > > (Load collected definitions from > > http://www.jsoftware.com/jwiki/Essays/Inverted_Table ) > > > > mfv=: ,:^:(#&$ = 1:) NB. Create 1 row matrix > from vector > > pplinv=: ifa ppl > > pinv =: ifa mfv p > > p2inv=: ifa mfv p2 > > pplinv tindexof pinv > > length error > > > > The problem is that converting ppl to an inverted table > extends each > > name to the length of the longest name. For pinv to match, its names > > also need to be extended to that same width. > > How can this best be done? > > > > My solutions as follows: > > textend=: {:@$&.>@[ {."1&.> ] > > pplinv textend pinv > > +-------+------+ > > |Joan |Saxon | > > +-------+------+ > > > > pplinv tindexof pplinv textend pinv > > 4 > > > > Or more directly: > > > > tindexof1=: [ tindexof {:@$&.>@[ {."1&.> ] > > pplinv tindexof1 pinv > > 4 > > (tolower each pplinv) tindexof1 tolower each p2inv > > 4 > > > > ts=: 6!:2 , 7!:[EMAIL PROTECTED] > > ts '(tolower each ppl) i. tolower each p2' > > 0.0206076470613 147456 > > ts '(tolower each ppl2inv) tindexof1 tolower > each p2inv' > > 0.000427987355935 48000 > > > > About 48 times faster and 3 time leaner using inverted tables. > > > > Even for a single lookup the overhead of converting to > inverted tables > > is worthwhile: > > ts '(tolower each ifa ppl) tindexof1 tolower > each ifa mfv p2' > > 0.000516546097339 57216 > > > > About 40 times faster and 2.5 times leaner. ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
