[REBOL] Find speed Re:(4)
Hi, (...) > Associative structures are best accessed with select, not > find. The hash! type optimizes select, but is no faster > than a block with find, as far as I can tell. > I think, that Select is sometimes not suitable for accessing the associative structures. See this: Associative: [ 1 6 2 5 3 4 4 3 5 2 6 1 ] >> print select associative 5 3
[REBOL] Find speed Re:(4)
Umm, the reason that the hash is slow is that it only hashes string type values. The test you were using had numeric values. Try your tests with strings and you should see a significant difference and I will add numeric values to the hash as well for the next release... However, I'm not planning on hashing block values... - jim At 06:31 PM 7/7/2000 -0500, you wrote: >Ladislav wrote: >>My experiment was raher related to Brian's Memoize, or other >>simple associative structures. The result is, that for structures >>containing more than 4 096 elements, the natives Find, Select,... >>can successfully be replaced by some appropriately designed >>approaches. > >Associative structures are best accessed with select, not >find. The hash! type optimizes select, but is no faster >than a block with find, as far as I can tell. > >Your bfind is faster than find with sorted data, as find >uses a linear search. When working with associative lists >you need to consider the overhead of sorting the data as >well. This either means resorting the list after every >insert, using an incremental insertion sort, or making >sure the data is inserted in order. Any of these may be >efficient to do manually in some circumstances. Another >advantage to your binary search is that you can search >records based on keys (with a little rewriting), making >it more appropriate for record lookup than find. > >Memoizing is only an advantage when the set of arguments >is small, when a lot of repeated calls with the same value >are common. Otherwise the overhead of inserts, either from >hashing overhead or incremental insertion sort (so you can >use bfind), outweighs the benefit. Even then memoizing is >only appropriate when the cost of calculating the return >value is more than the cost of memoizing it, and when no >side effects occur. You also must consider the memory >requirements of the lookup table. > >Over all I've found hash! tables to be most efficient for >association lists, using select to do the lookup. Lookup >tables are only a good idea when lookups outnumber inserts >by enough to make up for the insert overhead. Any other >case makes memoization a bad idea anyway. > >Any other kind of record lookup, say of records with more >than 2 fields, would be more efficient with your bfind >function modified to work with other record lengths with >a /skip refinement like sort. > >Brian Hawley
[REBOL] Find speed Re:(4)
Hi, > When you test the hash function, your timing function (time-blk) loops over > the block: (...) > This block includes the CREATION of the hash and INSERTING items into the > hash. Certainly creating a hash and inserting items into the hash is not > the same thing as searching in a hash and the time required to CREATE the > data structure and INSERT items into the data structure should not be part > of the value reported for the time required to SEARCH in the data structure. > > Inserting an element into a hash is by nature of hashing algorithms far > more time consuming than inserting an element into a block. Hashes are > optimized for locating an element at the penalty of consuming more time for > insertion. To correctly identify the time it takes to locate an element in > a hash, you must separate the insertion of the element into the hash from > the time you calculate to retrieve the element in the hash. > > If you separate insertion from the search in a hash, you must do the same > for the blocks as well. Only then will you be able to compare values that > reflect how long it takes to SEARCH in the different series types you are > testing! I was testing the usability of Hash! datatype for memoizing functions, the code is quite typical. The results show, that Hash! datatype is not suitable for that purpose. I tested a "pure searching" code too, see the results and the code: Test size: 4096 Block/find time: 7.8125E-3 not found Block/bfind time: 9.765625E-4 not found Hash/find time: 6.591796875E-3 not found Test size: 8192 Block/find time: 1.5625E-2 not found Block/bfind time: 1.220703125E-3 not found Hash/find time: 1.5625E-2 not found Test size: 16384 Block/find time: 2.34375E-2 not found Block/bfind time: 1.28173828125E-3 not found Hash/find time: 2.587890625E-2 not found include %timblk.r n: to integer! ask "Test size: " m: 10 * n ; random number range prin "Block/find time: " random/seed 1 c: 0 blk: copy [] repeat i n [ d: random m append blk d ] k: random m prin time-block [ l: find blk k ] 0.05 either l [ prin " index: " print index? l ] [ print " not found" ] prin "Block/bfind time: " random/seed 1 c: 0 blk: copy [] repeat i n [ d: random m append blk d ] sort blk k: random m prin time-block [ l: bfind blk k ] 0.05 either (pick blk l) = k [ prin " index: " print l ] [ print " not found" ] prin "Hash/find time: " random/seed 1 c: 0 blk: copy [] repeat i n [ d: random m append blk d ] blk: make hash! blk k: random m prin time-block [ l: find blk k ] 0.05 either l [ prin " index: " print index? l ] [ print " not found" ]