On Mon, Oct 27, 2014 at 7:10 AM, Brian Craft <[email protected]> wrote:
> I found iota, which looks like a good solution for the read portion of the
> problem. However I also need to process the data in the file. If I start
> with an iota/vec and need to sort it, something like
>
> (sort (iota/vec "foo"))
>
Short disclaimer: I'm the one to blame for Iota.
In this situation I've found it easier just to use GNU sort, or an external
tool, and then use Iota with the pre-sorted file. Or generate an index with
a smaller subset of the data.
Examples;
$ cat data.tsv | sort -k2 > sorted_data.tsv
or
(def data (iota/numbered-vec "data.tsv"))
(def index (->> data
(map (fn [line]
(let [[linenum key & _]
(clojure.string/split line #"\t" -1)]
[key linenum]))
(into (sorted-map))))
While Iota does use mmap under the hood for reading/caching, it can't
reduce memory consumption if all of the data is converted to a string. For
most "Clojurey" operations though, strings are preferred. So the trick is
to not realize the entire data set in memory and instead treat it like a
stream instead.
I'd love to add a mechanism to Iota to solve this problem, but I haven't
come upon a good solution yet.
I'm all ears though!
Hope this helps,
TheBusby
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.