[ClojureScript] Re: hash equality between Clojure and ClojureScript

2015-04-21 Thread Peter Taoussanis
Thanks Herwig, Francis - appreciate the assistance!

-- 
Note that posts from new members are moderated - please be patient with your 
first post.
--- 
You received this message because you are subscribed to the Google Groups 
ClojureScript group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojurescript+unsubscr...@googlegroups.com.
To post to this group, send email to clojurescript@googlegroups.com.
Visit this group at http://groups.google.com/group/clojurescript.


Re: [ClojureScript] Re: hash equality between Clojure and ClojureScript

2015-04-21 Thread Christian Weilbach
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 20.04.2015 20:02, Francis Avila wrote:
 There's no contract, but strings, keywords, and symbols should hash
 the same, and collections of these (vectors, lists, maps, sets)
 should hash the same.
 
 It's difficult to hash numbers the same between Clojure and
 Clojurescript.
 
 Clojurescript numbers are all doubles (because JS), so they should
 in theory hash the same as Clojure doubles. Clojure hashes doubles
 using Java's Double.hashCode(), which relies on knowing the exact
 bits of the double. These bits are not available in Javascript (at
 least not easily or without using typedarrays). I'm also not sure
 if this particular implementation of hashCode is part of the Java
 spec (i.e. implemented the same by all JDKs and JVMs.)
 
 Of course in practice the same clojure form (as read) will not hash
 the same in clj and cljs because clj uses longs most of the time.
 
 You could take a hybrid approach where integers in cljs are hashed
 like longs in Clojure. This works up to 52 bits, but longs with
 more bits than that are not representable in Clojurescript. This is
 an approach I was pursuing in my murmur3 hashing implementation for
 cljs:
 
 http://dev.clojure.org/jira/browse/CLJS-754 
 https://github.com/favila/clojurescript/blob/murmur3/src/cljs/cljs/core.cljs#L

 
https://github.com/favila/clojurescript/blob/murmur3/src/cljs/cljs/murmur3.cljs#L61
 
 
 In practice you will hash the same most of the time if you most
 deal with integer numbers, but any doubles, bigdecimals, or large
 integers will still hash differently.
 
 This is what happens in clojurescript now:
 
 (js-mod (Math/floor o) 2147483647)

Hi,

I am the author of hasch (1) for cryptographic cross-platform hashing,
which Herwig mentioned. The numeric types were a problem for me, too,
as Francis describes.  I guess that all the corner cases are difficult
to catch, so I decided to treat all numbers like doubles in edn data.
Still this is fragile due to floating point arithmetic differences to
JVM integer arithmetic... JavaScript is really bad for numerical tasks
sadly and I don't see a lightweight work-around. So once you
numerically calculate the same thing on both runtimes and expect the
hashed results to be the same, there could be trouble.

Additionally there is no Character type in JavaScript, so you have to
treat them like Strings as well. Normally different types with the
same content should hash differently, except for seqs and vectors, to
my current understanding.

My implementation is just a recursive hashing scheme protocol, which
allows to swap the hash function, so if you want something more
lightweight than sha512, you could also use a cross-platform murmur
implementation with hasch maybe. Maps and Sets are hashed elementwise
and XORed afterwards, which might cause performance problems in your
case. I haven't found a cheaper way to ensure against malicious
collisions.

Cheers,
Christian

(1) https://github.com/ghubber/hasch

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJVNh5tAAoJEKel+aujRZMk7mcH/05CV5ACKIZDJTEE7X38xG98
Sgxqc81JNKH5NFgcrkTfjqV2SlGROt7mxt3pVUEuLoiyE5U3CzeHBge0wpiBVmsB
eougHX43a5EL2qJWYhJTlORBXidxiT5uWdonyKH7AVNcURcM0I6P0BH92NPgq4hE
q8vQLdEJeqWoeHllLHb+te2sfmSArEESgNSMStgDZQF+J7ODJCKzako49UCtJxg1
FVjsHMK2+d50X9yTw4NP644PKssH3GdPBiqmyr0cx20jtpvn5x+q+xU5XaYbA0L4
Uk1FT38L4FoSRVzBMZfb0FsmF7WxdQOgKuQc3tVeLgJhcWSOjJ2js5Sd8gwMQgM=
=lywe
-END PGP SIGNATURE-

-- 
Note that posts from new members are moderated - please be patient with your 
first post.
--- 
You received this message because you are subscribed to the Google Groups 
ClojureScript group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojurescript+unsubscr...@googlegroups.com.
To post to this group, send email to clojurescript@googlegroups.com.
Visit this group at http://groups.google.com/group/clojurescript.


[ClojureScript] Re: hash equality between Clojure and ClojureScript

2015-04-20 Thread Francis Avila
There's no contract, but strings, keywords, and symbols should hash the same, 
and collections of these (vectors, lists, maps, sets) should hash the same.

It's difficult to hash numbers the same between Clojure and Clojurescript.

Clojurescript numbers are all doubles (because JS), so they should in theory 
hash the same as Clojure doubles. Clojure hashes doubles using Java's 
Double.hashCode(), which relies on knowing the exact bits of the double. These 
bits are not available in Javascript (at least not easily or without using 
typedarrays). I'm also not sure if this particular implementation of hashCode 
is part of the Java spec (i.e. implemented the same by all JDKs and JVMs.)

Of course in practice the same clojure form (as read) will not hash the same in 
clj and cljs because clj uses longs most of the time.

You could take a hybrid approach where integers in cljs are hashed like longs 
in Clojure. This works up to 52 bits, but longs with more bits than that are 
not representable in Clojurescript. This is an approach I was pursuing in my 
murmur3 hashing implementation for cljs:

http://dev.clojure.org/jira/browse/CLJS-754
https://github.com/favila/clojurescript/blob/murmur3/src/cljs/cljs/core.cljs#L
https://github.com/favila/clojurescript/blob/murmur3/src/cljs/cljs/murmur3.cljs#L61


In practice you will hash the same most of the time if you most deal with 
integer numbers, but any doubles, bigdecimals, or large integers will still 
hash differently.

This is what happens in clojurescript now:

(js-mod (Math/floor o) 2147483647)




On Sunday, April 19, 2015 at 11:03:08 PM UTC-5, Peter Taoussanis wrote:
 Hi there,
 
 Am running Clojure 1.7.0-beta1, ClojureScript 0.0-3196.
 
 Just noticed:
 (hash 1) ; 1, ClojureScript
 (hash 1 ) ; 1392991556, Clojure
 (.hashCode 1) ; 1, Clojure
 
 i.e. numeric hashes aren't consistent between Clojure and ClojureScript.
 
 I'm assuming that's intentional?
 
 This got me wondering: is there an official contract somewhere describing 
 hash behaviour similarities we _can_ safely depend on?
 
 Keywords, strings, and collections of these seem to produce matching hashes 
 (?) - but is that dependable behaviour or subject to change?
 
 Thanks a lot , cheers! :-)

-- 
Note that posts from new members are moderated - please be patient with your 
first post.
--- 
You received this message because you are subscribed to the Google Groups 
ClojureScript group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojurescript+unsubscr...@googlegroups.com.
To post to this group, send email to clojurescript@googlegroups.com.
Visit this group at http://groups.google.com/group/clojurescript.