On Sun, 2006-19-03 at 11:32 -0500, jamal wrote:

> I only did the 8 bit test - I may go back and do the 16 bit
> but now i am almost sure that the majority of the results will produce
> similar results. So i dont know if i should bother. 
> 

I repeated with 256 buckets only and 16 bits. The number of possible
values was capped to 0xffff. 

Same observation with X below being < 8. Anytime that mask length
happens regardless of the mask being 16,24 or 32bit, the old one fails
exactly the same way. Anytime the X is >=8 there is no difference
between the two.

> Observation:
> -----------
> 
> For a length of X for the mask, then the ideal value of buckets
> needed is 2^X. So in the case of 0xfc (length of 7), 128 buckets
> suffice.
> 
> 1)If more than 2^X were used, then both hashes wasted them.
> 
> 2)If less than 2^X were used, then the new hash used them effectively,
> but the old one wasted them still.
> 
> The old hash is never able to utilize all the buckets except
> for the following cases:
> a) mask length of 8 was used - this was the case in all hash bucket size
> b) mask length of 4 in the case of 256 buckets.
> #a and #b amount for about < 16% of all possible cases.
> This is what is tagged as a bug.
> 
> 3) The new hash always is several factors better in terms of worst
> case lookups. In general it was a factor of 4 better.
> 
> Conclusion
> ----------
> 
> Other than fixing a bug, then new hash is at least equal to the old
> hash in about 16% of the cases and better in the rest(>80% of the
> time). 
> This is true in the case of both memory abuse and worst case lookups.
> 
> I believe there is room for improvement but it has absolutely
> nothing to do with the old hash (and for that matter not the new one
> either, even though the new one is a lot better).


I am not so certain now you can do better than the current algorithm in
optimizing for lookup as well as memory use.
The key is for the user to look at the mask length used and conclude on
what the divisor should be.

cheers,
jamal

Attachment: j1-16b-256buckets.csv.gz
Description: GNU Zip compressed data

Reply via email to