[REBOL] Find speed Re:(3)

2000-07-07 Thread brian . hawley

Ladislav wrote:
My experiment was raher related to Brian's Memoize, or other
simple associative structures. The result is, that for structures
containing more than 4 096 elements, the natives Find, Select,...
can successfully be replaced by some appropriately designed
approaches.

Associative structures are best accessed with select, not
find. The hash! type optimizes select, but is no faster
than a block with find, as far as I can tell.

Your bfind is faster than find with sorted data, as find
uses a linear search. When working with associative lists
you need to consider the overhead of sorting the data as
well. This either means resorting the list after every
insert, using an incremental insertion sort, or making
sure the data is inserted in order. Any of these may be
efficient to do manually in some circumstances. Another
advantage to your binary search is that you can search
records based on keys (with a little rewriting), making
it more appropriate for record lookup than find.

Memoizing is only an advantage when the set of arguments
is small, when a lot of repeated calls with the same value
are common. Otherwise the overhead of inserts, either from
hashing overhead or incremental insertion sort (so you can
use bfind), outweighs the benefit. Even then memoizing is
only appropriate when the cost of calculating the return
value is more than the cost of memoizing it, and when no
side effects occur. You also must consider the memory
requirements of the lookup table.

Over all I've found hash! tables to be most efficient for
association lists, using select to do the lookup. Lookup
tables are only a good idea when lookups outnumber inserts
by enough to make up for the insert overhead. Any other
case makes memoization a bad idea anyway.

Any other kind of record lookup, say of records with more
than 2 fields, would be more efficient with your bfind
function modified to work with other record lengths with
a /skip refinement like sort.

Brian Hawley




[REBOL] Find speed Re:(3)

2000-07-06 Thread icimjs

Hi Ladislav,

your test function is responsible for the results you get, not the hash!

The critical code in the time-blk function is
start: now
loop :count :block
finish: now

When you test the hash function, your timing function (time-blk) loops over
the block:

[
random/seed 1
c: 0
blk: make hash! []
repeat i n [
d: random m
if not find blk d [
c: c + 1
insert blk d
]
]
] 0.05

This block includes the CREATION of the hash and INSERTING items into the
hash. Certainly creating a hash and inserting items into the hash is not
the same thing as searching in a hash and the time required to CREATE the
data structure and INSERT items into the data structure should not be part
of the value reported for the time required to SEARCH in the data structure.

Inserting an element into a hash is by nature of hashing algorithms far
more time consuming than inserting an element into a block. Hashes are
optimized for locating an element at the penalty of consuming more time for
insertion. To correctly identify the time it takes to locate an element in
a hash, you must separate the insertion of the element into the hash from
the time you calculate to retrieve the element in the hash.

If you separate insertion from the search in a hash, you must do the same
for the blocks as well. Only then will you be able to compare values that
reflect how long it takes to SEARCH in the different series types you are
testing!

At 10:46 AM 7/6/00 +0200, you wrote:
Hi,

regarding memoizing. There is a class of functions, that must use
some kind of memoizing to work at all, I think. The number of the
possibilities for parameters does not matter, you can store only
important results (depending on what you consider important - eg.
expensive or recent results...)

Because I used a non-real-life example for testing, here is
something more adequate for memoizing purposes:

Block/find time: 14.5 count: 7771
Block/bfind time: 5.875 count: 7771
Hash/find time: 28.5 count: 7771

The source code:

REBOL[
Title: "Seconds"
Author: "Ladislav Mecir"
Email: [EMAIL PROTECTED]
Date: 19/5/1999
File: %seconds.r
]

seconds: function [
{Compute difference between dates in seconds}
a [date!] "first date"
b [date!] "second date"
] [diff] [
diff: b - a
diff: 24 * diff + b/time/hour - a/time/hour +
b/zone/hour - a/zone/hour
diff: 60 * diff + b/time/minute - a/time/minute
60 * diff + b/time/second - a/time/second
]


REBOL[
Title: "Time-block"
Author: "Ladislav Mecir"
E-mail: [EMAIL PROTECTED]
Date: 19/5/1999
File: %timblk.r
Purpose: {
Measure time of a block execution with given relative
accuracy.
This function can even time the execution of the empty
block.
Suggested values for accuracy are: 0.05 to 0.30.
Accuracy better than 0.05 may not be reachable,
accuracy worse than 0.30 may not be useful.
}
]

include %seconds.r

time-block: function [
"Time a block."
block [block!]
accuracy [decimal!]
/verbose
] [
guess count lower upper start finish result
] [
if verbose [print ["Timing a block:" block]]
guess: 0
count: 1
lower: 1 - :accuracy
upper: 1 + :accuracy
while [
start: now
loop :count :block
finish: now
result: (seconds start finish) / count
if verbose [
prin "Iterations: "
prin count
prin ". Time/iteration: "
prin result
prin " seconds.^/"
]
any [
result = 0
guess  (result * lower)
guess  (result * upper)
]
][
guess: result
count: count * 2
]
result
]

{
Example:

time-block [] 0,05
}

Rebol []

bfind: func [
blk [block!]
value
/local d u m
] [
d: 1
u: length? blk
while [(m: to integer! (d + u) / 2)  d] [
either (pick blk m) = value [d: m] [u: m]
]
either (pick blk u) = value [u] [d]
]

include %timblk.r

n: 8192 ; the test size
m: 10 * n ; random number range
prin "Block/find time: "
prin time-block [
random/seed 1
c: 0
blk: copy []
repeat i n [
d: random m
if not find blk d [
c: c + 1
append blk d
]
]
] 0.05
prin " count: "
print c
prin "Block/bfind time: "
prin time-block [
random/seed 1
c: 0
blk: copy []
repeat i n [
d: random m
either empty? blk [
c: c + 1
insert blk d
] [
bf: bfind blk d
if not (pick blk bf) = d [
c: c + 1
insert skip blk either (pick blk bf)  d [
bf - 1
] [
 bf
] d
]
]
]
] 0.05
prin " count: "
print c
prin "Hash/find time: "
prin time-block [
random/seed 1
c: 0