Re: Performant flattening of nested data structures

2015-02-18 Thread Sébastien Bocq
Hi Mark,

Here is a version runs about 1.3x faster on my setup.

(defn flatten-keys [thing]
  (letfn [(-map-key [prefix k] (str prefix . (name k)))
  (-seq-key [prefix i] (str prefix [ i ]))
  (-flatten-entry [make-key prefix result entry]
(let [[k v] entry]
  (-flatten-thing result (make-key prefix k) v)))
  (-flatten-thing [result key thing]
(cond (map? thing)
  (reduce (partial -flatten-entry -map-key key)
  result
  (seq thing))
  (sequential? thing)
  (reduce (partial -flatten-entry -seq-key key)
  result
  (map-indexed vector thing))
  :else (assoc! result key thing)))]
(persistent!
 (-flatten-thing (transient {}) (if (map? thing) $ ) thing
And it can take an empty collection as argument.


Sebastien
Le mardi 17 février 2015 17:14:29 UTC+1, Mark Watson a écrit :

 I want to flatten a map of nested maps/vecs

 Currently I'm using the fn's:

 (defn- flatten-keys* [a ks m]

   (cond

(map? m) (reduce into

 (map (fn [[k v]]

(flatten-keys* a (if-not (empty? ks)

   (str ks . (name k))

   (str $. (name k))) v)) (seq 
 m)))

(and (sequential? m)

 (not (instance? clojure.lang.MapEntry m))) (reduce into

(map-indexed (fn 
 [idx itm]

   
 (flatten-keys* a

   
(str ks [ idx ])

   
itm))

 (seq 
 m)))

:else (assoc a ks m)))

 (defn flatten-keys [m] (flatten-keys* {}  m))

 (flatten-keys {:name {:first Rich :last Hickey} :number [1 415 123 4567]})

 ;; = {$.number[0] 1, $.number[1] 415, $.number[2] 123, $.number[3] 
 4567, $.name.first Rich, $.name.last Hickey}


 However, I was wondering if anyone had suggestions on a more performant fn?

 I have a white-list of possible flattened keys. But, the list is big 
 (~1200), and with with most (~95%) of the keys having corresponding vals = 
 nil it was much less performant using get-in on every possible key.

 Thanks for the help!



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performant flattening of nested data structures

2015-02-17 Thread Fluid Dynamics
You've probably gotten nearly all of the performance you're going to get 
out of it without resorting to mutability, perhaps by threading a 
StringBuilder instance through the whole affair, or at least accumulating 
not a string (via a sequence of str calls) but a coll and doing a big 
apply str over that at the base case: (assoc a (apply str ks) m), which 
amounts to the same thing (should end up using a single StringBuilder under 
the hood to assemble the entire string in one pass, without creating a 
temporary String object at each intermediate level).

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Performant flattening of nested data structures

2015-02-17 Thread Mark Watson
I want to flatten a map of nested maps/vecs

Currently I'm using the fn's:

(defn- flatten-keys* [a ks m]

  (cond

   (map? m) (reduce into

(map (fn [[k v]]

   (flatten-keys* a (if-not (empty? ks)

  (str ks . (name k))

  (str $. (name k))) v)) (seq m)))

   (and (sequential? m)

(not (instance? clojure.lang.MapEntry m))) (reduce into

   (map-indexed (fn 
[idx itm]

  
(flatten-keys* a


 (str ks [ idx ])


 itm))

(seq 
m)))

   :else (assoc a ks m)))

(defn flatten-keys [m] (flatten-keys* {}  m))

(flatten-keys {:name {:first Rich :last Hickey} :number [1 415 123 4567]})

;; = {$.number[0] 1, $.number[1] 415, $.number[2] 123, $.number[3] 
4567, $.name.first Rich, $.name.last Hickey}


However, I was wondering if anyone had suggestions on a more performant fn?

I have a white-list of possible flattened keys. But, the list is big 
(~1200), and with with most (~95%) of the keys having corresponding vals = 
nil it was much less performant using get-in on every possible key.

Thanks for the help!

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performant flattening of nested data structures

2015-02-17 Thread Mark Watson
A slightly cleaner version:

(defn- flatten-keys* [a ks m]

  (cond

   ;; Is a map?

   (map? m) (reduce into

(map (fn [[k v]]

   (flatten-keys* a (str ks . (name k)) v))

 (seq m)))

   ;; Is an arr/vec/seq?

   (and (sequential? m)

(not (instance? clojure.lang.MapEntry m))) (reduce into

   (map-indexed (fn 
[idx itm]

  
(flatten-keys* a


 (str ks [ idx ])


 itm))

(seq 
m)))

   ;; Is not a collection

   :else (assoc a ks m)))

(defn flatten-keys [m] (flatten-keys* {} $ m))

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performant flattening of nested data structures

2015-02-17 Thread Ivan L
i wasn't able to squeeze that much more out of it, informally maybe around 
10% as it gets larger.

(defn- flx 
  ([obj] (flx obj [$] {}))
  ([obj curr res]
(cond
  (map? obj)
  (reduce 
#(flx (%2 obj) (conj curr (str . (name %2))) %1) 
res 
(keys obj))
  (sequential? obj)
  (reduce 
#(flx (second %2) (conj curr (str [ (first %2) ])) %1) 
res 
(map #(identity (list %1 %2)) (range (count obj)) obj))
  :else
  (assoc res (apply str curr) obj


On Tuesday, February 17, 2015 at 11:18:58 AM UTC-5, Mark Watson wrote:

 A slightly cleaner version:

 (defn- flatten-keys* [a ks m]

   (cond

;; Is a map?

(map? m) (reduce into

 (map (fn [[k v]]

(flatten-keys* a (str ks . (name k)) v))

  (seq m)))

;; Is an arr/vec/seq?

(and (sequential? m)

 (not (instance? clojure.lang.MapEntry m))) (reduce into

(map-indexed (fn 
 [idx itm]

   
 (flatten-keys* a

   
(str ks [ idx ])

   
itm))

 (seq 
 m)))

;; Is not a collection

:else (assoc a ks m)))

 (defn flatten-keys [m] (flatten-keys* {} $ m))



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performant flattening of nested data structures

2015-02-17 Thread Francis Avila
This is probably easier if you do it in two passes: one to assemble the 
get-in path down to a value, and another to stringify that path. 

(def input {:name {:first Rich :last Hickey} :number [1 415 123 4567]})
;= #'user/input
(def expected {$.number[0] 1, $.number[1] 415, $.number[2] 123, 
$.number[3] 4567, $.name.first Rich, $.name.last Hickey})
;= #'user/expected
(defn keypaths
  ([xs] (keypaths [] xs))
  ([path-prefix xs]
(reduce-kv
  (fn [kp+v k v]
(let [path (conj path-prefix k)]
  (if (associative? v) ;; You may need to change this.
(into kp+v (keypaths path v))
(conj kp+v [path v]
  []
  xs)))
;= #'user/keypaths
(defn keypath-str [path]
  (let [str-part
(fn [x] (cond
  (or (string? x) (char? x)) (str \. x)
  (instance? clojure.lang.Named x) (str \. (name x))
  :else (str \[ x \])))
str-path ^String (apply str (map str-part path))]
(if (.startsWith str-path .)
  (subs str-path 1)
  str-path)))
;= #'user/keypath-str
(keypaths input)
;= [[[:number 0] 1] [[:number 1] 415] [[:number 2] 123] [[:number 3] 4567] 
[[:name :first] Rich] [[:name :last] Hickey]]
(keypaths [$] input)
;= [[[$ :number 0] 1] [[$ :number 1] 415] [[$ :number 2] 123] [[$ :
number 3] 4567] [[$ :name :first] Rich] [[$ :name :last] Hickey]]
(- (keypaths [$] input)
 (map (fn [[path v]] [(keypath-str path) v]))
 (into {}))
;= {$.number[0] 1, $.number[1] 415, $.number[2] 123, $.number[3] 
4567, $.name.first Rich, $.name.last Hickey}
(= *1 expected)
;= true

So that increases readability. What about performance?

I'm not entirely sure what you are trying to optimize. You mention a 
whitelist of keys. Is there something related to that you want to optimize? 
Maybe you don't need to stringify keys at all? Perhaps you need to 
represent your whitelist as a set of vectors (each of which is a keypath), 
then filter the un-stringified keypaths from maps by that whitelist:

(def whitelist #{[:number 0] [:number 1] [:number 2] [:name :first] [:name 
:last]})
;= #'user/whitelist
(- (keypaths input)
 (filter (comp whitelist first)))
;= ([[:number 0] 1] [[:number 1] 415] [[:number 2] 123] [[:name :first] 
Rich] [[:name :last] Hickey])


If the problem is that you have extremely large maps to flatten, you can 
try folding reducers. There are two opportunities for parallelism: 
constructing the paths, and stringifying the paths. Stringifying the paths 
is more obvious:

(require '[clojure.core.reducers :as r])
;= nil
(defn r-flatten-assoc [m]
  (- (keypaths [\$] m)
   (r/map (fn [[path v]] [(keypath-str path) v]
;= #'breeze.mast.repl/r-flatten-assoc
(- (r-flatten-assoc input) (r/foldcat) (into {}) (= expected)
;= true


But paralleling construction of paths is a bit harder. The following 
example only folds over maps; to fold over the vectors with index without 
lazy seqs or realizing maps requires a custom reducer. (I also use 
transients for good measure.)

(defn r-keypaths
  ([xs] (r-keypaths [] xs))
  ([path-prefix xs]
(if (map? xs)
  (- xs
   (r/mapcat (fn [k v]
   (let [path (conj path-prefix k)]
 (if (associative? v)
   (r-keypaths path v)
   [[path v]]
   (r/foldcat))
  (- (reduce-kv
 (fn [kp+v k v]
   (let [path (conj path-prefix k)]
 (if (associative? v)
   (reduce conj! kp+v (r-keypaths path v))
   (conj! kp+v [path v]
 (transient [])
 xs)
   (persistent!)







On Tuesday, February 17, 2015 at 10:18:58 AM UTC-6, Mark Watson wrote:

 A slightly cleaner version:

 (defn- flatten-keys* [a ks m]

   (cond

;; Is a map?

(map? m) (reduce into

 (map (fn [[k v]]

(flatten-keys* a (str ks . (name k)) v))

  (seq m)))

;; Is an arr/vec/seq?

(and (sequential? m)

 (not (instance? clojure.lang.MapEntry m))) (reduce into

(map-indexed (fn 
 [idx itm]

   
 (flatten-keys* a

   
(str ks [ idx ])

   
itm))

 (seq 
 m)))

;; Is not a collection

:else (assoc a ks m)))

 (defn flatten-keys [m] (flatten-keys* {} $ m))



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group,