I got a way which works fine: use bigint first, and then convert it to bit(32),
and convert it to int4 at last.
declare i integer := 0;
declare h bigint := 0;
begin
for i in 1..length(str) loop
h = (h * 31 + ascii(substring(str, i, 1))) & 4294967295;
end loop;
return cast(cast(h as bit(32)) as int4);
end;
I did some tests which include both positive and negative results, seems all Ok.
On Sep 21, 2012, at 11:21 PM, Craig James <[email protected]> wrote:
> On Thu, Sep 20, 2012 at 7:56 PM, Haifeng Liu <[email protected]> wrote:
>
> On Sep 20, 2012, at 10:34 PM, Craig James <[email protected]> wrote:
>
>>
>>
>> On Thu, Sep 20, 2012 at 1:55 AM, Haifeng Liu <[email protected]> wrote:
>> I want to write a hash function which acts as String.hashCode() in java:
>> hash = hash * 31 + s.charAt(i)... but I got integer out of range error. How
>> can I avoid this? I saw java do not care overflow of int, it just make the
>> result negative.
>>
>>
>> Use the bitwise AND operator to mask the hash value with 0x3FFFFFF before
>> each iteration:
>>
>> hash = (hash & 67108863) * 31 + s.charAt(i);
>>
>> Craig
>
> Thank you, I believe your solution is OK for a hash function, but I am aiming
> to create a hash function that is consistent with the one applications use. I
> know postgresql 9.1 has a hash function called hashtext, but I don't know
> what algorithm it use, and I also see that it's not recommended to relay on
> it. So I am trying to create a hash function which behaves exactly the same
> as java.lang.String.hashCode(). The later one may generate negative hash
> value. I guess when the number is overflowing, the part out of range will be
> ignored, and if the highest bit get 1, the hash value turn to negative value.
>
> You are probably doing something where you want the application and the
> database to implement the exact same function, but if you stick to the Java
> built-in function, you will only have control over one implementation of that
> function. What happens if someone working on Java changes the how the Java
> internals work?
That's not the trouble, just create a hash tool which copies the code of
java.lang.String.hashCode() and use that tool instead will resolve this. The
key is, I know and I can reimplement the algorithm.
>
> A better solution would be to implement your own hash function in Postgres,
> and then once you know exactly how it will work, re-implement it in Java with
> your own code. That's the only way you can ensure consistency between the
> two.
>
> Craig