Re: [Jprogramming] Performance of case-insensitive lookup

Roger Hui Fri, 12 Oct 2007 08:27:45 -0700

> But then, in those cases, you should use the nub anyhow since 
>       (x i. y) -: (~.x) i. y


That is not true in general:

   x=: 20 [EMAIL PROTECTED] 10
   y=: 5 [EMAIL PROTECTED] 10
   x i. y
20 1 9 6 4
   (~.x) i. y
9 1 6 5 4

The "those cases" for which the statement is true, must be
that  x-:~.x   or that y has 0 items or some more complicated
condition on x and y (all items of y occur in a prefix of x
that has all unique items).



----- Original Message -----
From: "R.E. Boss" <[EMAIL PROTECTED]>
Date: Friday, October 12, 2007 3:04
Subject: RE: [Jprogramming] Performance of case-insensitive lookup
To: 'Programming forum' <[email protected]>

> The adverb I use for these cases, where [EMAIL PROTECTED] is much less than # 
> , is 
> 
>       fst=:1 : '](i.!.0~ { u @:]) ~.'
> 
>    5 ts'tolower &.> ppl'
> 0.018118821 72128
>    5 ts'tolower &.>fst ppl'
> 0.00065326459 19328
> 
>    (tolower &.>fst -: tolower &.>) ppl
> 1
> 
> It is about 28 times as fast and almost 4 times as lean, which 
> gives a
> relative performance of more than 100. which is still inferior 
> to yours.
> 
> But then, in those cases, you should use the nub anyhow since 
> 
>       (x i. y) -: (~.x) i. y
> 
>    ts'ppl i.&:(tolower &.>) p'
> 0.017758768 73856
>    ts'(~.ppl) i.&:(tolower &.>) p'
> 0.00043554953 9856
> 
> Which gives a relative performance improvement of a factor 300
>    
> 
> R.E. Boss
> 
>  
> 
> 
> > -----Oorspronkelijk bericht-----
> > Van: [EMAIL PROTECTED] [mailto:programming-
> > [EMAIL PROTECTED] Namens Sherlock, Ric
> > Verzonden: vrijdag 12 oktober 2007 11:20
> > Aan: Programming forum
> > Onderwerp: [Jprogramming] Performance of case-insensitive lookup
> > 
> > I was doing a case-insensitive lookup of firstname and 
> lastname in a
> > 2-column boxed table.
> >    fnames=: <;._1 ' John Dakota Wilson Diana 
> Joan Roberto John John'
> >    lnames=: <;._1 ' Smith Jones Chan Wilson 
> Saxon Angelo Smith Wilson'
> >    ]ppl=:500 $ fnames,.lnames
> > +-------+------+
> > |John   |Smith |
> > +-------+------+
> > |Dakota |Jones |
> > +-------+------+
> > |Wilson |Chan  |
> > +-------+------+
> > |Diana  |Wilson|
> > +-------+------+
> > |Joan   |Saxon |
> > +-------+------+
> > |Roberto|Angelo|
> > +-------+------+
> > |John   |Smith |
> > +-------+------+
> > |John   |Wilson|
> > ...
> > 
> >    p=: 'Joan';'Saxon'
> >    p2=:'JOAN';'saxon'
> >    ppl i. p
> > 4
> >   (tolower each ppl) i. tolower each p2
> > 4
> > 
> > However performance wasn't great, which I tracked it down to 
> having to
> > run the verb tolower so many times. Below I've documented a 
> solution to
> > this performance problem using inverted tables, but would be 
> interested> in other possible ways of bypassing the performance 
> hit caused by making
> > the lookup case-insensitive.
> > 
> > A solution using inverted tables.
> > (Load collected definitions from
> > http://www.jsoftware.com/jwiki/Essays/Inverted_Table )
> > 
> >    mfv=: ,:^:(#&$ = 1:) NB. Create 1 row matrix 
> from vector
> >    pplinv=: ifa ppl
> >    pinv =:  ifa mfv p
> >    p2inv=:  ifa mfv p2
> >    pplinv tindexof pinv
> > length error
> > 
> > The problem is that converting ppl to an inverted table 
> extends each
> > name to the length of the longest name. For pinv to match, its names
> > also need to be extended to that same width.
> > How can this best be done?
> > 
> > My solutions as follows:
> >    textend=: {:@$&.>@[ {."1&.> ]
> >    pplinv textend pinv
> > +-------+------+
> > |Joan   |Saxon |
> > +-------+------+
> > 
> >    pplinv tindexof pplinv textend pinv
> > 4
> > 
> > Or more directly:
> > 
> >    tindexof1=: [ tindexof {:@$&.>@[ {."1&.> ]
> >    pplinv tindexof1 pinv
> > 4
> >    (tolower each pplinv) tindexof1 tolower each p2inv
> > 4
> > 
> >    ts=: 6!:2 , 7!:[EMAIL PROTECTED]
> >    ts '(tolower each ppl) i. tolower each p2'
> > 0.0206076470613 147456
> >    ts '(tolower each ppl2inv) tindexof1 tolower 
> each p2inv'
> > 0.000427987355935 48000
> > 
> > About 48 times faster and 3 time leaner using inverted tables.
> > 
> > Even for a single lookup the overhead of converting to 
> inverted tables
> > is worthwhile:
> >    ts '(tolower each ifa ppl) tindexof1 tolower 
> each ifa mfv p2'
> > 0.000516546097339 57216
> > 
> > About 40 times faster and 2.5 times leaner.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Performance of case-insensitive lookup

Reply via email to