Grégoire Seux <kamaradclim...@gmail.com> writes:

> hello !
>
> i would like to use the polymorhpic hash function on strings. But i would like
> to know what is the probability of a collision between two hashes. 
>
> my first question is about the range of the Hashtbl.hash function: what is its
> range ? ( string -> [1..N] ?)

It is range [min_int..max_int]. Meaning 31 or 63 bits.

> the second question is : can i assume that the result is a uniform 
> distribution
> over [1..N] ? (for 10? words which is an estimation of the english vocabulary
> size)

If the function is any good it will be reasonably uniform. And if it is
crap then I bet someone have replaced it already.

> the third one is : is it possible to predict which will be the collision ? I
> mean collisions are between words which are very 'similar' (for ex: "boy" and
> "boys") or are completely unpredictable.

You can read the source, analyze the math and try to figure out a
collision. From past experience every widely used hash function so far
has been shown to be vulnerable. Like md5 is now considered compromised
due to the ease of creating a collision. Sha1 is deemed unsafe too. It
is just a matter of how much brainpower and cpu time you want to throw
at it. Or pure bad luck.

Is the Hashtbl.hash the same function used to hash polymorphic variant
types? If so then there are collisions between words which are verry
similar. I knew of an example once but google isn't verry helpfull
finding it again.

MfG
        Goswin

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Reply via email to