[REBOL] Find speed Re:(4)

2000-07-08 Thread lmecir

Hi,

(...)

> Associative structures are best accessed with select, not
> find. The hash! type optimizes select, but is no faster
> than a block with find, as far as I can tell.
>

I think, that Select is sometimes not suitable for accessing the
associative structures. See this:

Associative: [
1 6
2 5
3 4
4 3
5 2
6 1
]

>> print select associative 5
3






[REBOL] Find speed Re:(4)

2000-07-07 Thread jimg

Umm, the reason that the hash is slow is that it only hashes string type 
values. The test you were using had numeric values. Try your tests with 
strings and you should see a significant difference and I will add numeric 
values to the hash as well for the next release... However, I'm not 
planning on hashing block values...

  - jim

At 06:31 PM 7/7/2000 -0500, you wrote:
>Ladislav wrote:
>>My experiment was raher related to Brian's Memoize, or other
>>simple associative structures. The result is, that for structures
>>containing more than 4 096 elements, the natives Find, Select,...
>>can successfully be replaced by some appropriately designed
>>approaches.
>
>Associative structures are best accessed with select, not
>find. The hash! type optimizes select, but is no faster
>than a block with find, as far as I can tell.
>
>Your bfind is faster than find with sorted data, as find
>uses a linear search. When working with associative lists
>you need to consider the overhead of sorting the data as
>well. This either means resorting the list after every
>insert, using an incremental insertion sort, or making
>sure the data is inserted in order. Any of these may be
>efficient to do manually in some circumstances. Another
>advantage to your binary search is that you can search
>records based on keys (with a little rewriting), making
>it more appropriate for record lookup than find.
>
>Memoizing is only an advantage when the set of arguments
>is small, when a lot of repeated calls with the same value
>are common. Otherwise the overhead of inserts, either from
>hashing overhead or incremental insertion sort (so you can
>use bfind), outweighs the benefit. Even then memoizing is
>only appropriate when the cost of calculating the return
>value is more than the cost of memoizing it, and when no
>side effects occur. You also must consider the memory
>requirements of the lookup table.
>
>Over all I've found hash! tables to be most efficient for
>association lists, using select to do the lookup. Lookup
>tables are only a good idea when lookups outnumber inserts
>by enough to make up for the insert overhead. Any other
>case makes memoization a bad idea anyway.
>
>Any other kind of record lookup, say of records with more
>than 2 fields, would be more efficient with your bfind
>function modified to work with other record lengths with
>a /skip refinement like sort.
>
>Brian Hawley




[REBOL] Find speed Re:(4)

2000-07-07 Thread lmecir

Hi,

> When you test the hash function, your timing function (time-blk)
loops over
> the block:
(...)
> This block includes the CREATION of the hash and INSERTING items
into the
> hash. Certainly creating a hash and inserting items into the
hash is not
> the same thing as searching in a hash and the time required to
CREATE the
> data structure and INSERT items into the data structure should
not be part
> of the value reported for the time required to SEARCH in the
data structure.
>
> Inserting an element into a hash is by nature of hashing
algorithms far
> more time consuming than inserting an element into a block.
Hashes are
> optimized for locating an element at the penalty of consuming
more time for
> insertion. To correctly identify the time it takes to locate an
element in
> a hash, you must separate the insertion of the element into the
hash from
> the time you calculate to retrieve the element in the hash.
>
> If you separate insertion from the search in a hash, you must do
the same
> for the blocks as well. Only then will you be able to compare
values that
> reflect how long it takes to SEARCH in the different series
types you are
> testing!

I was testing the usability of Hash! datatype for memoizing
functions, the code is quite typical. The results show, that Hash!
datatype is not suitable for that purpose.

I tested a "pure searching" code too, see the results and the
code:

Test size: 4096
Block/find time: 7.8125E-3 not found
Block/bfind time: 9.765625E-4 not found
Hash/find time: 6.591796875E-3 not found

Test size: 8192
Block/find time: 1.5625E-2 not found
Block/bfind time: 1.220703125E-3 not found
Hash/find time: 1.5625E-2 not found

Test size: 16384
Block/find time: 2.34375E-2 not found
Block/bfind time: 1.28173828125E-3 not found
Hash/find time: 2.587890625E-2 not found

include %timblk.r

n: to integer! ask "Test size: "
m: 10 * n ; random number range
prin "Block/find time: "
random/seed 1
c: 0
blk: copy []
repeat i n [
d: random m
append blk d
]
k: random m
prin time-block [
l: find blk k
] 0.05
either l [
prin " index: "
print index? l
] [
print " not found"
]
prin "Block/bfind time: "
random/seed 1
c: 0
blk: copy []
repeat i n [
d: random m
append blk d
]
sort blk
k: random m
prin time-block [
l: bfind blk k
] 0.05
either (pick blk l) = k [
prin " index: "
print l
] [
print " not found"
]
prin "Hash/find time: "
random/seed 1
c: 0
blk: copy []
repeat i n [
d: random m
append blk d
]
blk: make hash! blk
k: random m
prin time-block [
l: find blk k
] 0.05
either l [
prin " index: "
print index? l
] [
print " not found"
]