[REBOL] Find speed Re:(3)
Ladislav wrote: My experiment was raher related to Brian's Memoize, or other simple associative structures. The result is, that for structures containing more than 4 096 elements, the natives Find, Select,... can successfully be replaced by some appropriately designed approaches. Associative structures are best accessed with select, not find. The hash! type optimizes select, but is no faster than a block with find, as far as I can tell. Your bfind is faster than find with sorted data, as find uses a linear search. When working with associative lists you need to consider the overhead of sorting the data as well. This either means resorting the list after every insert, using an incremental insertion sort, or making sure the data is inserted in order. Any of these may be efficient to do manually in some circumstances. Another advantage to your binary search is that you can search records based on keys (with a little rewriting), making it more appropriate for record lookup than find. Memoizing is only an advantage when the set of arguments is small, when a lot of repeated calls with the same value are common. Otherwise the overhead of inserts, either from hashing overhead or incremental insertion sort (so you can use bfind), outweighs the benefit. Even then memoizing is only appropriate when the cost of calculating the return value is more than the cost of memoizing it, and when no side effects occur. You also must consider the memory requirements of the lookup table. Over all I've found hash! tables to be most efficient for association lists, using select to do the lookup. Lookup tables are only a good idea when lookups outnumber inserts by enough to make up for the insert overhead. Any other case makes memoization a bad idea anyway. Any other kind of record lookup, say of records with more than 2 fields, would be more efficient with your bfind function modified to work with other record lengths with a /skip refinement like sort. Brian Hawley
[REBOL] Find speed Re:(3)
Hi Ladislav, your test function is responsible for the results you get, not the hash! The critical code in the time-blk function is start: now loop :count :block finish: now When you test the hash function, your timing function (time-blk) loops over the block: [ random/seed 1 c: 0 blk: make hash! [] repeat i n [ d: random m if not find blk d [ c: c + 1 insert blk d ] ] ] 0.05 This block includes the CREATION of the hash and INSERTING items into the hash. Certainly creating a hash and inserting items into the hash is not the same thing as searching in a hash and the time required to CREATE the data structure and INSERT items into the data structure should not be part of the value reported for the time required to SEARCH in the data structure. Inserting an element into a hash is by nature of hashing algorithms far more time consuming than inserting an element into a block. Hashes are optimized for locating an element at the penalty of consuming more time for insertion. To correctly identify the time it takes to locate an element in a hash, you must separate the insertion of the element into the hash from the time you calculate to retrieve the element in the hash. If you separate insertion from the search in a hash, you must do the same for the blocks as well. Only then will you be able to compare values that reflect how long it takes to SEARCH in the different series types you are testing! At 10:46 AM 7/6/00 +0200, you wrote: Hi, regarding memoizing. There is a class of functions, that must use some kind of memoizing to work at all, I think. The number of the possibilities for parameters does not matter, you can store only important results (depending on what you consider important - eg. expensive or recent results...) Because I used a non-real-life example for testing, here is something more adequate for memoizing purposes: Block/find time: 14.5 count: 7771 Block/bfind time: 5.875 count: 7771 Hash/find time: 28.5 count: 7771 The source code: REBOL[ Title: "Seconds" Author: "Ladislav Mecir" Email: [EMAIL PROTECTED] Date: 19/5/1999 File: %seconds.r ] seconds: function [ {Compute difference between dates in seconds} a [date!] "first date" b [date!] "second date" ] [diff] [ diff: b - a diff: 24 * diff + b/time/hour - a/time/hour + b/zone/hour - a/zone/hour diff: 60 * diff + b/time/minute - a/time/minute 60 * diff + b/time/second - a/time/second ] REBOL[ Title: "Time-block" Author: "Ladislav Mecir" E-mail: [EMAIL PROTECTED] Date: 19/5/1999 File: %timblk.r Purpose: { Measure time of a block execution with given relative accuracy. This function can even time the execution of the empty block. Suggested values for accuracy are: 0.05 to 0.30. Accuracy better than 0.05 may not be reachable, accuracy worse than 0.30 may not be useful. } ] include %seconds.r time-block: function [ "Time a block." block [block!] accuracy [decimal!] /verbose ] [ guess count lower upper start finish result ] [ if verbose [print ["Timing a block:" block]] guess: 0 count: 1 lower: 1 - :accuracy upper: 1 + :accuracy while [ start: now loop :count :block finish: now result: (seconds start finish) / count if verbose [ prin "Iterations: " prin count prin ". Time/iteration: " prin result prin " seconds.^/" ] any [ result = 0 guess (result * lower) guess (result * upper) ] ][ guess: result count: count * 2 ] result ] { Example: time-block [] 0,05 } Rebol [] bfind: func [ blk [block!] value /local d u m ] [ d: 1 u: length? blk while [(m: to integer! (d + u) / 2) d] [ either (pick blk m) = value [d: m] [u: m] ] either (pick blk u) = value [u] [d] ] include %timblk.r n: 8192 ; the test size m: 10 * n ; random number range prin "Block/find time: " prin time-block [ random/seed 1 c: 0 blk: copy [] repeat i n [ d: random m if not find blk d [ c: c + 1 append blk d ] ] ] 0.05 prin " count: " print c prin "Block/bfind time: " prin time-block [ random/seed 1 c: 0 blk: copy [] repeat i n [ d: random m either empty? blk [ c: c + 1 insert blk d ] [ bf: bfind blk d if not (pick blk bf) = d [ c: c + 1 insert skip blk either (pick blk bf) d [ bf - 1 ] [ bf ] d ] ] ] ] 0.05 prin " count: " print c prin "Hash/find time: " prin time-block [ random/seed 1 c: 0