Re: Performant flattening of nested data structures
Hi Mark, Here is a version runs about 1.3x faster on my setup. (defn flatten-keys [thing] (letfn [(-map-key [prefix k] (str prefix . (name k))) (-seq-key [prefix i] (str prefix [ i ])) (-flatten-entry [make-key prefix result entry] (let [[k v] entry] (-flatten-thing result (make-key prefix k) v))) (-flatten-thing [result key thing] (cond (map? thing) (reduce (partial -flatten-entry -map-key key) result (seq thing)) (sequential? thing) (reduce (partial -flatten-entry -seq-key key) result (map-indexed vector thing)) :else (assoc! result key thing)))] (persistent! (-flatten-thing (transient {}) (if (map? thing) $ ) thing And it can take an empty collection as argument. Sebastien Le mardi 17 février 2015 17:14:29 UTC+1, Mark Watson a écrit : I want to flatten a map of nested maps/vecs Currently I'm using the fn's: (defn- flatten-keys* [a ks m] (cond (map? m) (reduce into (map (fn [[k v]] (flatten-keys* a (if-not (empty? ks) (str ks . (name k)) (str $. (name k))) v)) (seq m))) (and (sequential? m) (not (instance? clojure.lang.MapEntry m))) (reduce into (map-indexed (fn [idx itm] (flatten-keys* a (str ks [ idx ]) itm)) (seq m))) :else (assoc a ks m))) (defn flatten-keys [m] (flatten-keys* {} m)) (flatten-keys {:name {:first Rich :last Hickey} :number [1 415 123 4567]}) ;; = {$.number[0] 1, $.number[1] 415, $.number[2] 123, $.number[3] 4567, $.name.first Rich, $.name.last Hickey} However, I was wondering if anyone had suggestions on a more performant fn? I have a white-list of possible flattened keys. But, the list is big (~1200), and with with most (~95%) of the keys having corresponding vals = nil it was much less performant using get-in on every possible key. Thanks for the help! -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Performant flattening of nested data structures
You've probably gotten nearly all of the performance you're going to get out of it without resorting to mutability, perhaps by threading a StringBuilder instance through the whole affair, or at least accumulating not a string (via a sequence of str calls) but a coll and doing a big apply str over that at the base case: (assoc a (apply str ks) m), which amounts to the same thing (should end up using a single StringBuilder under the hood to assemble the entire string in one pass, without creating a temporary String object at each intermediate level). -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Performant flattening of nested data structures
I want to flatten a map of nested maps/vecs Currently I'm using the fn's: (defn- flatten-keys* [a ks m] (cond (map? m) (reduce into (map (fn [[k v]] (flatten-keys* a (if-not (empty? ks) (str ks . (name k)) (str $. (name k))) v)) (seq m))) (and (sequential? m) (not (instance? clojure.lang.MapEntry m))) (reduce into (map-indexed (fn [idx itm] (flatten-keys* a (str ks [ idx ]) itm)) (seq m))) :else (assoc a ks m))) (defn flatten-keys [m] (flatten-keys* {} m)) (flatten-keys {:name {:first Rich :last Hickey} :number [1 415 123 4567]}) ;; = {$.number[0] 1, $.number[1] 415, $.number[2] 123, $.number[3] 4567, $.name.first Rich, $.name.last Hickey} However, I was wondering if anyone had suggestions on a more performant fn? I have a white-list of possible flattened keys. But, the list is big (~1200), and with with most (~95%) of the keys having corresponding vals = nil it was much less performant using get-in on every possible key. Thanks for the help! -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Performant flattening of nested data structures
A slightly cleaner version: (defn- flatten-keys* [a ks m] (cond ;; Is a map? (map? m) (reduce into (map (fn [[k v]] (flatten-keys* a (str ks . (name k)) v)) (seq m))) ;; Is an arr/vec/seq? (and (sequential? m) (not (instance? clojure.lang.MapEntry m))) (reduce into (map-indexed (fn [idx itm] (flatten-keys* a (str ks [ idx ]) itm)) (seq m))) ;; Is not a collection :else (assoc a ks m))) (defn flatten-keys [m] (flatten-keys* {} $ m)) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Performant flattening of nested data structures
i wasn't able to squeeze that much more out of it, informally maybe around 10% as it gets larger. (defn- flx ([obj] (flx obj [$] {})) ([obj curr res] (cond (map? obj) (reduce #(flx (%2 obj) (conj curr (str . (name %2))) %1) res (keys obj)) (sequential? obj) (reduce #(flx (second %2) (conj curr (str [ (first %2) ])) %1) res (map #(identity (list %1 %2)) (range (count obj)) obj)) :else (assoc res (apply str curr) obj On Tuesday, February 17, 2015 at 11:18:58 AM UTC-5, Mark Watson wrote: A slightly cleaner version: (defn- flatten-keys* [a ks m] (cond ;; Is a map? (map? m) (reduce into (map (fn [[k v]] (flatten-keys* a (str ks . (name k)) v)) (seq m))) ;; Is an arr/vec/seq? (and (sequential? m) (not (instance? clojure.lang.MapEntry m))) (reduce into (map-indexed (fn [idx itm] (flatten-keys* a (str ks [ idx ]) itm)) (seq m))) ;; Is not a collection :else (assoc a ks m))) (defn flatten-keys [m] (flatten-keys* {} $ m)) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Performant flattening of nested data structures
This is probably easier if you do it in two passes: one to assemble the get-in path down to a value, and another to stringify that path. (def input {:name {:first Rich :last Hickey} :number [1 415 123 4567]}) ;= #'user/input (def expected {$.number[0] 1, $.number[1] 415, $.number[2] 123, $.number[3] 4567, $.name.first Rich, $.name.last Hickey}) ;= #'user/expected (defn keypaths ([xs] (keypaths [] xs)) ([path-prefix xs] (reduce-kv (fn [kp+v k v] (let [path (conj path-prefix k)] (if (associative? v) ;; You may need to change this. (into kp+v (keypaths path v)) (conj kp+v [path v] [] xs))) ;= #'user/keypaths (defn keypath-str [path] (let [str-part (fn [x] (cond (or (string? x) (char? x)) (str \. x) (instance? clojure.lang.Named x) (str \. (name x)) :else (str \[ x \]))) str-path ^String (apply str (map str-part path))] (if (.startsWith str-path .) (subs str-path 1) str-path))) ;= #'user/keypath-str (keypaths input) ;= [[[:number 0] 1] [[:number 1] 415] [[:number 2] 123] [[:number 3] 4567] [[:name :first] Rich] [[:name :last] Hickey]] (keypaths [$] input) ;= [[[$ :number 0] 1] [[$ :number 1] 415] [[$ :number 2] 123] [[$ : number 3] 4567] [[$ :name :first] Rich] [[$ :name :last] Hickey]] (- (keypaths [$] input) (map (fn [[path v]] [(keypath-str path) v])) (into {})) ;= {$.number[0] 1, $.number[1] 415, $.number[2] 123, $.number[3] 4567, $.name.first Rich, $.name.last Hickey} (= *1 expected) ;= true So that increases readability. What about performance? I'm not entirely sure what you are trying to optimize. You mention a whitelist of keys. Is there something related to that you want to optimize? Maybe you don't need to stringify keys at all? Perhaps you need to represent your whitelist as a set of vectors (each of which is a keypath), then filter the un-stringified keypaths from maps by that whitelist: (def whitelist #{[:number 0] [:number 1] [:number 2] [:name :first] [:name :last]}) ;= #'user/whitelist (- (keypaths input) (filter (comp whitelist first))) ;= ([[:number 0] 1] [[:number 1] 415] [[:number 2] 123] [[:name :first] Rich] [[:name :last] Hickey]) If the problem is that you have extremely large maps to flatten, you can try folding reducers. There are two opportunities for parallelism: constructing the paths, and stringifying the paths. Stringifying the paths is more obvious: (require '[clojure.core.reducers :as r]) ;= nil (defn r-flatten-assoc [m] (- (keypaths [\$] m) (r/map (fn [[path v]] [(keypath-str path) v] ;= #'breeze.mast.repl/r-flatten-assoc (- (r-flatten-assoc input) (r/foldcat) (into {}) (= expected) ;= true But paralleling construction of paths is a bit harder. The following example only folds over maps; to fold over the vectors with index without lazy seqs or realizing maps requires a custom reducer. (I also use transients for good measure.) (defn r-keypaths ([xs] (r-keypaths [] xs)) ([path-prefix xs] (if (map? xs) (- xs (r/mapcat (fn [k v] (let [path (conj path-prefix k)] (if (associative? v) (r-keypaths path v) [[path v]] (r/foldcat)) (- (reduce-kv (fn [kp+v k v] (let [path (conj path-prefix k)] (if (associative? v) (reduce conj! kp+v (r-keypaths path v)) (conj! kp+v [path v] (transient []) xs) (persistent!) On Tuesday, February 17, 2015 at 10:18:58 AM UTC-6, Mark Watson wrote: A slightly cleaner version: (defn- flatten-keys* [a ks m] (cond ;; Is a map? (map? m) (reduce into (map (fn [[k v]] (flatten-keys* a (str ks . (name k)) v)) (seq m))) ;; Is an arr/vec/seq? (and (sequential? m) (not (instance? clojure.lang.MapEntry m))) (reduce into (map-indexed (fn [idx itm] (flatten-keys* a (str ks [ idx ]) itm)) (seq m))) ;; Is not a collection :else (assoc a ks m))) (defn flatten-keys [m] (flatten-keys* {} $ m)) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group,