On Mon, Oct 27, 2014 at 7:10 AM, Brian Craft <craft.br...@gmail.com> wrote:
> I found iota, which looks like a good solution for the read portion of the > problem. However I also need to process the data in the file. If I start > with an iota/vec and need to sort it, something like > > (sort (iota/vec "foo")) > Short disclaimer: I'm the one to blame for Iota. In this situation I've found it easier just to use GNU sort, or an external tool, and then use Iota with the pre-sorted file. Or generate an index with a smaller subset of the data. Examples; $ cat data.tsv | sort -k2 > sorted_data.tsv or (def data (iota/numbered-vec "data.tsv")) (def index (->> data (map (fn [line] (let [[linenum key & _] (clojure.string/split line #"\t" -1)] [key linenum])) (into (sorted-map)))) While Iota does use mmap under the hood for reading/caching, it can't reduce memory consumption if all of the data is converted to a string. For most "Clojurey" operations though, strings are preferred. So the trick is to not realize the entire data set in memory and instead treat it like a stream instead. I'd love to add a mechanism to Iota to solve this problem, but I haven't come upon a good solution yet. I'm all ears though! Hope this helps, TheBusby -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.