The adverb I use for these cases, where [EMAIL PROTECTED] is much less than # ,
is
fst=:1 : '](i.!.0~ { u @:]) ~.'
5 ts'tolower &.> ppl'
0.018118821 72128
5 ts'tolower &.>fst ppl'
0.00065326459 19328
(tolower &.>fst -: tolower &.>) ppl
1
It is about 28 times as fast and almost 4 times as lean, which gives a
relative performance of more than 100. which is still inferior to yours.
But then, in those cases, you should use the nub anyhow since
(x i. y) -: (~.x) i. y
ts'ppl i.&:(tolower &.>) p'
0.017758768 73856
ts'(~.ppl) i.&:(tolower &.>) p'
0.00043554953 9856
Which gives a relative performance improvement of a factor 300
R.E. Boss
> -----Oorspronkelijk bericht-----
> Van: [EMAIL PROTECTED] [mailto:programming-
> [EMAIL PROTECTED] Namens Sherlock, Ric
> Verzonden: vrijdag 12 oktober 2007 11:20
> Aan: Programming forum
> Onderwerp: [Jprogramming] Performance of case-insensitive lookup
>
> I was doing a case-insensitive lookup of firstname and lastname in a
> 2-column boxed table.
> fnames=: <;._1 ' John Dakota Wilson Diana Joan Roberto John John'
> lnames=: <;._1 ' Smith Jones Chan Wilson Saxon Angelo Smith Wilson'
> ]ppl=:500 $ fnames,.lnames
> +-------+------+
> |John |Smith |
> +-------+------+
> |Dakota |Jones |
> +-------+------+
> |Wilson |Chan |
> +-------+------+
> |Diana |Wilson|
> +-------+------+
> |Joan |Saxon |
> +-------+------+
> |Roberto|Angelo|
> +-------+------+
> |John |Smith |
> +-------+------+
> |John |Wilson|
> ...
>
> p=: 'Joan';'Saxon'
> p2=:'JOAN';'saxon'
> ppl i. p
> 4
> (tolower each ppl) i. tolower each p2
> 4
>
> However performance wasn't great, which I tracked it down to having to
> run the verb tolower so many times. Below I've documented a solution to
> this performance problem using inverted tables, but would be interested
> in other possible ways of bypassing the performance hit caused by making
> the lookup case-insensitive.
>
> A solution using inverted tables.
> (Load collected definitions from
> http://www.jsoftware.com/jwiki/Essays/Inverted_Table )
>
> mfv=: ,:^:(#&$ = 1:) NB. Create 1 row matrix from vector
> pplinv=: ifa ppl
> pinv =: ifa mfv p
> p2inv=: ifa mfv p2
> pplinv tindexof pinv
> length error
>
> The problem is that converting ppl to an inverted table extends each
> name to the length of the longest name. For pinv to match, its names
> also need to be extended to that same width.
> How can this best be done?
>
> My solutions as follows:
> textend=: {:@$&.>@[ {."1&.> ]
> pplinv textend pinv
> +-------+------+
> |Joan |Saxon |
> +-------+------+
>
> pplinv tindexof pplinv textend pinv
> 4
>
> Or more directly:
>
> tindexof1=: [ tindexof {:@$&.>@[ {."1&.> ]
> pplinv tindexof1 pinv
> 4
> (tolower each pplinv) tindexof1 tolower each p2inv
> 4
>
> ts=: 6!:2 , 7!:[EMAIL PROTECTED]
> ts '(tolower each ppl) i. tolower each p2'
> 0.0206076470613 147456
> ts '(tolower each ppl2inv) tindexof1 tolower each p2inv'
> 0.000427987355935 48000
>
> About 48 times faster and 3 time leaner using inverted tables.
>
> Even for a single lookup the overhead of converting to inverted tables
> is worthwhile:
> ts '(tolower each ifa ppl) tindexof1 tolower each ifa mfv p2'
> 0.000516546097339 57216
>
> About 40 times faster and 2.5 times leaner.
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm