I'm looking for the most performant way to transform a huge seq (size
250000) of maps into a single CSV.
The data structure looks something like:
(def data-struct
(repeat 250000 {:one 1 :two 2 :three 3 :four 4}))
A naive implementation would be:
(let [f #(->> % (map (comp str val)) (clojure.string/join ","))]
(->> data-struct
(map f)
(clojure.string/join "\n")))
However, this takes far too long for my application (an the order of 10s of
seconds).
Another attempt using reducers:
(require '[clojure.core.reducers :as r])
(let [f #(->> % (map (comp str val)) (clojure.string/join ","))
r-join (fn
([] nil)
([x y]
(if (and x y) (str x "\n" y)
(if x (str x)
(if y (str y))))))]
(->> data-struct
(r/map f)
(r/fold r-join)))
Still not great.
But, Looking at the sources of clojure.string/join and clojure.core/str, it
becomes apparent that the both implementations create an instance of
java.lang.StringBuilder
for each element in the sequence. (I have to imagine this is the main
issue, even though GC seems to only be ~5% of the runtime)
Would it make sense to instantiate one java.lang.StringBuilder for all of
the concatenation (and call java.lang.StringBuilder append)?
What's the best way to do this with idiomatic Clojure?
Thanks a lot!
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.