[REBOL] Find speed Re:(3)

icimjs Thu, 06 Jul 2000 13:03:02 -0700
Hi Ladislav,

your test function is responsible for the results you get, not the hash!

The critical code in the time-blk function is
>        start: now
>        loop :count :block
>        finish: now

When you test the hash function, your timing function (time-blk) loops over
the block:

>[
>    random/seed 1
>    c: 0
>    blk: make hash! []
>    repeat i n [
>        d: random m
>        if not find blk d [
>            c: c + 1
>            insert blk d
>        ]
>    ]
>] 0.05

This block includes the CREATION of the hash and INSERTING items into the
hash. Certainly creating a hash and inserting items into the hash is not
the same thing as searching in a hash and the time required to CREATE the
data structure and INSERT items into the data structure should not be part
of the value reported for the time required to SEARCH in the data structure.

Inserting an element into a hash is by nature of hashing algorithms far
more time consuming than inserting an element into a block. Hashes are
optimized for locating an element at the penalty of consuming more time for
insertion. To correctly identify the time it takes to locate an element in
a hash, you must separate the insertion of the element into the hash from
the time you calculate to retrieve the element in the hash.

If you separate insertion from the search in a hash, you must do the same
for the blocks as well. Only then will you be able to compare values that
reflect how long it takes to SEARCH in the different series types you are
testing!

At 10:46 AM 7/6/00 +0200, you wrote:
>Hi,
>
>regarding memoizing. There is a class of functions, that must use
>some kind of memoizing to work at all, I think. The number of the
>possibilities for parameters does not matter, you can store only
>important results (depending on what you consider important - eg.
>expensive or recent results...)
>
>Because I used a non-real-life example for testing, here is
>something more adequate for memoizing purposes:
>
>Block/find time: 14.5 count: 7771
>Block/bfind time: 5.875 count: 7771
>Hash/find time: 28.5 count: 7771
>
>The source code:
>
>REBOL[
>    Title: "Seconds"
>    Author: "Ladislav Mecir"
>    Email: [EMAIL PROTECTED]
>    Date: 19/5/1999
>    File: %seconds.r
>]
>
>seconds: function [
>    {Compute difference between dates in seconds}
>    a [date!] "first date"
>    b [date!] "second date"
>] [diff] [
>    diff: b - a
>    diff: 24 * diff + b/time/hour - a/time/hour +
>        b/zone/hour - a/zone/hour
>    diff: 60 * diff + b/time/minute - a/time/minute
>    60 * diff + b/time/second - a/time/second
>]
>
>
>REBOL[
>    Title: "Time-block"
>    Author: "Ladislav Mecir"
>    E-mail: [EMAIL PROTECTED]
>    Date: 19/5/1999
>    File: %timblk.r
>    Purpose: {
>        Measure time of a block execution with given relative
>accuracy.
>        This function can even time the execution of the empty
>block.
>        Suggested values for accuracy are: 0.05 to 0.30.
>        Accuracy better than 0.05 may not be reachable,
>        accuracy worse than 0.30 may not be useful.
>    }
>]
>
>include %seconds.r
>
>time-block: function [
>    "Time a block."
>    block [block!]
>    accuracy [decimal!]
>    /verbose
>] [
>    guess count lower upper start finish result
>] [
>    if verbose [print ["Timing a block:" block]]
>    guess: 0
>    count: 1
>    lower: 1 - :accuracy
>    upper: 1 + :accuracy
>    while [
>        start: now
>        loop :count :block
>        finish: now
>        result: (seconds start finish) / count
>        if verbose [
>            prin "Iterations: "
>            prin count
>            prin ". Time/iteration: "
>            prin result
>            prin " seconds.^/"
>        ]
>        any [
>            result <= 0
>            guess < (result * lower)
>            guess > (result * upper)
>        ]
>    ][
>        guess: result
>        count: count * 2
>    ]
>    result
>]
>
>{
>    Example:
>
>    time-block [] 0,05
>}
>
>Rebol []
>
>bfind: func [
>    blk [block!]
>    value
>    /local d u m
>] [
>    d: 1
>    u: length? blk
>    while [(m: to integer! (d + u) / 2) > d] [
>        either (pick blk m) <= value [d: m] [u: m]
>    ]
>    either (pick blk u) <= value [u] [d]
>]
>
>include %timblk.r
>
>n: 8192 ; the test size
>m: 10 * n ; random number range
>prin "Block/find time: "
>prin time-block [
>    random/seed 1
>    c: 0
>    blk: copy []
>    repeat i n [
>        d: random m
>        if not find blk d [
>            c: c + 1
>            append blk d
>        ]
>    ]
>] 0.05
>prin " count: "
>print c
>prin "Block/bfind time: "
>prin time-block [
>    random/seed 1
>    c: 0
>    blk: copy []
>    repeat i n [
>        d: random m
>        either empty? blk [
>            c: c + 1
>            insert blk d
>        ] [
>            bf: bfind blk d
>            if not (pick blk bf) = d [
>                c: c + 1
>                insert skip blk either (pick blk bf) > d [
>                    bf - 1
>                ] [
>                     bf
>                ] d
>            ]
>        ]
>    ]
>] 0.05
>prin " count: "
>print c
>prin "Hash/find time: "
>prin time-block [
>    random/seed 1
>    c: 0
>    blk: make hash! []
>    repeat i n [
>        d: random m
>        if not find blk d [
>            c: c + 1
>            insert blk d
>        ]
>    ]
>] 0.05
>prin " count: "
>print c
>
>
>>
>>
>> Hi, Ladislav,
>>
>> if you are interested only in an associative array,
>> then a properly implemented hash should beat the
>> hell out of even a binary search on large data.
>>
>> I guess it depends somewhat on the quality of the
>> hash function and other aspects of the implementation.
>>
>> So, if you can beat the Rebol hash, which is probably compiled
>> c-code, with your binary search written in interpreted Rebol,
>> then RT had better take a look at their Rebol hashes.
>>
>> Would you like to share some test source code with us
>> so we can try to duplicate and analyze your results?
>>
>> Thanks!
>>
>> -galt
>>
>> p.s. Does memo-ize only make sense when there are a limited
>> number of possibilities for the parameters?  If the space of
>possible
>> values is very large and requests could come in a sort of
>> random or rarely repeating way then the memo-ize trick wouldn't
>> do much good, would it?
>>
>>
>>
>
>
>

;- Elan [ : - ) ]
[REBOL] Find speed Re:(3)

Reply via email to