Hi Christian, hasch looks nice, I might end up just using it. I will be hashing smaller collections (maps where keys are keywords and values are atomic data like integers).
Collisions BTW are not such a big deal for my use case. I will have a limited number of fragments (buckets, index pages, etc.) anyway. 65536 of them perhaps. The more I think about the problem the more I realize I am implementing some sort of hash map. On Mon, Aug 10, 2015 at 3:49 PM, Christian Weilbach < whitesp...@polyc0l0r.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > I am the author of https://github.com/whilo/hasch > > Would calling hasch.core/edn-hash satisfy your performance > requirements? I tried hard to make the recursion of the protocol > performant, but hashing a value is slower than the time needed to > write the data to disk for big collections. You should pick a faster > message-digest like you suggested, e.g. MD5: > > (defn ^MessageDigest md5-message-digest [] > (MessageDigest/getInstance "md5")) > > (edn-hash {:foo "Bar" :baz 5} md5-message-digest) > > You can use the criterium benchmarking snippets in platform.clj to do > benchmarks. Object.hashCode() is a lot faster still and caches the > result, I am not sure how much overhead the protocol dispatch causes. > > Note that if some collisions are ok for you, you might find a better > tradeoff, since atm. commutative collections like maps and sets are > hashed key-value wise and then XOR'd for safety. I am interested in > your findings and decision, especially if you pick something else. > > Christian > > On 10.08.2015 09:00, Atamert Ölçgen wrote: > > Hi, > > > > I need a way to reduce a compound value, say {:foo "bar"}, into a > > number (like 693d9a0698aff95c in hex). I don't necessarily need a > > very large hash space, 7 hex digits is good enough for my purposes. > > But I need this hash to be consistent between runs and JVM versions > > etc. So I guess that rules out standard object hashes. > > > > I would like to find a sufficiently fast way to do this. I can live > > with MD5, but are there faster alternatives (but produce smaller > > hashes)? ( clj-digest <https://github.com/tebeka/clj-digest> > > provides a nice interface to what Java provides but there are only > > usual suspects AFAICS > > < > http://docs.oracle.com/javase/7/docs/technotes/guides/security/StandardNames.html#MessageDigest > > > > > > > ) > > > > I will be dealing with unordered collections, but it seems hashing > > is consistent when the input order is changed: > > > > user=> (.hashCode {:foo "Bar" :baz 5}) 2040536238 user=> (.hashCode > > {:baz 5 :foo "Bar"}) 2040536238 > > > > > > (It even gave the same hash code in different runs.) > > > > I will use these hashes to build index tables. My data, that > > contains these things I hash is a set. I will store this as an > > ordered set and keep an index pointing to where records from this > > hash to that hash lives. This is all Clojure, but I can't keep all > > my data in memory. (So Clojure's persistent data structures is out > > of the picture. life would've been much simpler if I could.) > > > > Thanks for reading. Any insight is appreciated. > > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1 > > iQEcBAEBAgAGBQJVyJ3vAAoJEKel+aujRZMkbhMIAJ61DGUWM9JoN/JcIxvh2Jph > VohlWbr1yw69D+x4guGOk5AXUh7HMAkmlbuc+YRRnYqGhZtc3r/6C/d/aa5faBAh > NdIeDa8yNHTAuYERDktfviy+q5a/blJRdvIIe7ntyjpDZyd2gD1AwUGYOKctXipS > wMPan7v7yPfPlFfnl+VVXfP8yx/LWyZbwfu0Ugv2B2NhvqPMu8joyondOz7GPcLd > P7EgpIrvfQAElA4c4+UB0BEeJkn+fnpYF3QLJIy5oQny5QwbVtxgVuUNES8EolYl > HkpFY1ECV/M65fvP6wrcYPihuphSYQoPkfY4ZQfzWCq9mo+3Aj1Jq2u7QfG9HxM= > =1UE6 > -----END PGP SIGNATURE----- > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Kind Regards, Atamert Ölçgen ◻◼◻ ◻◻◼ ◼◼◼ www.muhuk.com -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.