On 4 Feb 2005, [EMAIL PROTECTED] wrote:

> I had considered using an MD5/SHA hash at one point since it would
> easily mitigate the vowel issue. The only problem I could see is that
> there is no insurance regarding balance. This can mitigate this by
> making the hash very deep.
> 
> Both systems seem good, I think I will actually need to run some
> simulations to see how balanced the MD5 system would be with my
> current set of customer data.
> 
> Your input is appreciated.

If by balance you mean how well the MD5 hashes of various data are
distributed statistically, they are designed to have a high Huffman
distance (basically the number of bits if you XOR the two hashes, if I
remember my college classes on this correctly) from similar input
data.  This means that user names (which are fairly similar on the
whole as you observed) will generate hashes that look nothing like
each other.

Let me know how your tests turn out.

Ted

Reply via email to