On Jan 21, 2014, at 07:11 , Chris Perkins <[email protected]> wrote:
> This part: (some #{hashed} already-seen) is doing a linear lookup in
> `already-seen`. Try (contains? already-seen hashed) instead.
Or just (already-seen hashed), given that OP's not trying to store nil hashes.
To OP: note that if you’re storing the hashes as strings (as it appears),
you’re using 16 more bytes per hash than necessary. If you’re really going to
be dealing with so many URLs that you’d use too much memory by storing the
unique URLs directly, then you should probably be storing the hashes as byte
arrays.
Alternatively, if you’re going to be dealing with REALLY large files and are
running on Linux/BSD, consider dumping just the URLs to a file and using “sort
-u” on it. UNIX Sort can efficiently handle files that are too large to fit in
memory, via external merge sort.
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.