Re: Multi-level bucketing problem
I'm sure this can be simplyfied: (defn mlg [attrs data] (if (empty? attrs) [ (reduce + (map :mv data)) {:children data}] (let [parts (group-by (first attrs) data) subtrees (map (fn [[value data]] [value (mlg (rest attrs) (map #(dissoc % (first attrs)) data))]) parts)] (reduce (fn [[sum tree] [value [sumsubtree subtree]]] [(+ sum sumsubtree) (update-in tree [:children] conj (assoc subtree :path [(first attrs) value] :mv sumsubtree))] ) [ 0.0 { :children [] }] subtrees Returns a pair with the sum for all items and a tree. Each tree is represented as a dictionary, and inner nodes of the tree have three keys: - :mv the sum of :mv's of its children - :path a pair of attr-value that represents all the leaves in the subtree - :children the subtrees of this level Leaves are represented as dictionaries with only the keys :sec_id and :mv. Best regards, Juan Manuel -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: RE: Multi-level bucketing problem
I'm sure this can be simplyfied: (defn mlg [attrs data] (if (empty? attrs) [ (reduce + (map :mv data)) {:children data}] (let [parts (group-by (first attrs) data) subtrees (map (fn [[value data]] [value (mlg (rest attrs) (map #(dissoc % (first attrs)) data))]) parts)] (reduce (fn [[sum tree] [value [sumsubtree subtree]]] [(+ sum sumsubtree) (update-in tree [:children] conj (assoc subtree :path [(first attrs) value] :mv sumsubtree))] ) [ 0.0 { :children [] }] subtrees Returns a pair with the sum for all items and a tree. Each tree is represented as a dictionary, and inner nodes of the tree have three keys: - :mv the sum of :mv's of its children - :path a pair of attr-value that represents all the leaves in the subtree - :children the subtrees of this level Leaves are represented as dictionaries with only the keys :sec_id and :mv. I've forked your gist, so you can grab the code directly from github https://gist.github.com/952861 Best regards, Juan Manuel -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
RE: RE: Multi-level bucketing problem
Whoa! Thanks Juan. I will start to understand/analyze this... From: clojure@googlegroups.com [mailto:clojure@googlegroups.com] On Behalf Of JuanManuel Gimeno Illa Sent: Tuesday, May 03, 2011 11:40 AM To: clojure@googlegroups.com Subject: Re: RE: Multi-level bucketing problem I'm sure this can be simplyfied: (defn mlg [attrs data] (if (empty? attrs) [ (reduce + (map :mv data)) {:children data}] (let [parts (group-by (first attrs) data) subtrees (map (fn [[value data]] [value (mlg (rest attrs) (map #(dissoc % (first attrs)) data))]) parts)] (reduce (fn [[sum tree] [value [sumsubtree subtree]]] [(+ sum sumsubtree) (update-in tree [:children] conj (assoc subtree :path [(first attrs) value] :mv sumsubtree))] ) [ 0.0 { :children [] }] subtrees Returns a pair with the sum for all items and a tree. Each tree is represented as a dictionary, and inner nodes of the tree have three keys: - :mv the sum of :mv's of its children - :path a pair of attr-value that represents all the leaves in the subtree - :children the subtrees of this level Leaves are represented as dictionaries with only the keys :sec_id and :mv. I've forked your gist, so you can grab the code directly from github https://gist.github.com/952861 Best regards, Juan Manuel -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Multi-level bucketing problem
One way is not to use a tree structure but to aggregate by composed keys, starting with [:attr1] then [:attr1 :attr2] ... (defn sum-by [data attrs] (let [aggregated (group-by (apply juxt attrs) data)] (zipmap (keys aggregated) (map #(reduce + (map :mv %)) (vals aggregated) (println (sum-by data [:attr1 :attr2])) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Multi-level bucketing problem
Google Collections has a multimap that could help you with this. It is pretty cool. As a note they actually push immutability pretty hard - maybe Rich got to them? http://guava-libraries.googlecode.com/svn/tags/release09/javadoc/com/google/common/collect/Multimaps.html On May 1, 10:57 pm, Bhinderwala, Shoeb sabhinderw...@wellington.com wrote: Hi fellow clojurers: I need help to group my data at multiple levels (currently 4) using a tree structure and perform aggregate calculations at each level. I have the below clojure code to generate a list of test records for me (using the get-rec function). (def reg-cntry-list {America [USA Canada Mexico Venezuala Brazil Argentina Cuba] Asia [India Pakistan Singapore China Japan Sri Lanka Malaysia] Europe [UK Germany France Italy Belgium Turkey Finland] Middle East [Saudi Arabia Bahrain UAE Kuwait Yemen Qatar Iraq] Africa [Libya Tanzania South Africa Kenya Ethiopia Morocco Zimbabwe]}) (def sec-ind-list {Basic Materials [Apparel Auto Part Building Packaged] Consumer Goods [Beveragess Cigarettes Drugs Newspapers] Financial [Life Insurance Banking Investment Funds] Healthcare [Home care Hospitals Plans Medical] Industrial [Chemicals Cleaning Machine Lumber] Services [Advertising Broadcasting Education Publishing] Technology [Biotechnology Computers Data Storage Electronics] Utilities [Farm Products Electric Gas Oil]}) (defn get-rec [] (let [r (rand-nth (keys reg-cntry-list)) s (rand-nth (keys sec-ind-list))] {:sec_id (rand-int 1000) :attr1 r :attr2 (rand-nth (reg-cntry-list r)) :attr3 s :attr4 (rand-nth (sec-ind-list s)) :mv (rand 100) })) ;generate 50 random records (def data (take 50 (repeatedly get-rec))) Each record (map) has the following keys: :sec_id - security id :attr1 - attribute 1 of the security :attr2 - attribute 2 of the security :attr3 - attribute 3 of the security :attr4 - attribute 4 of the security :mv - market value of the security What I need is a tree like data structure that can group my data in four levels (based on attr1, attr2, attr3 and attr4) as follows and also store total market value (total mv) for each level: root (total mv of all recs) |America (total mv for America) | |USA (total mv for America/USA) | |Financial (total mv for America/USA/Financial) | |___Banking (total mv for America/USA/Financial/Banking) | | :sec_id 1 :mv 889 | :sec_id 2 :mv 393 |___Funds (total mv for America/USA/Financial/Funds) | | :sec_id 3 :mv 33 | :sec_id 4 :mv 393 |Technology | |___Electronics | | sec_id 5 :mv 93 | sec_id 6 :mv 29 |___Data Storage | | sec_id 7 :mv 389 | sec_id 8 :mv 93 |Canada | |Industrial | |___Machine | | sec_id 10 :mv 34 | sec_id 11 :mv 93 |___Lumber | | sec_id 12 :mv 93 | sec_id 13 :mv 93 |___Europe (total mv for Europe) | |Germany | |Financial | |___Banking | | sec_id 1 :mv 93 | sec_id 2 :mv 93 |Technology | |___Electronics | | sec_id 5 :mv 93 | sec_id 6 :mv 93 |France | |Industrial | |___Lumber | | sec_id 12 :mv 93 | sec_id 13 :mv 93 I tried the group-by function but can't write it to get the nested data I want and to perform the aggregate computations at each level. I am learning Clojure using this exercise which is part of a bigger
RE: Multi-level bucketing problem
Hi Miki What you provided is an amazing piece of code. I need a lot more time to understand how it works since it uses so many higher order functions. I have uploaded my original problem code base and your solution at: https://gist.github.com/952382 This is just a simplified version of my problem. What we are trying to do is group data in hierarchies and then performs all sorts of calculations on each grouping at each level. The calculations at the lower grouping levels also roll up and feed calculations at the higher levels. Your idea of not creating a tree and simply using functions to compute the values for the groups (nodes) is certainly very thought provoking. Thanks Shoeb From: clojure@googlegroups.com [mailto:clojure@googlegroups.com] On Behalf Of Miki Sent: Monday, May 02, 2011 11:33 AM To: clojure@googlegroups.com Subject: Re: Multi-level bucketing problem One way is not to use a tree structure but to aggregate by composed keys, starting with [:attr1] then [:attr1 :attr2] ... (defn sum-by [data attrs] (let [aggregated (group-by (apply juxt attrs) data)] (zipmap (keys aggregated) (map #(reduce + (map :mv %)) (vals aggregated) (println (sum-by data [:attr1 :attr2])) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Multi-level bucketing problem
Hi fellow clojurers: I need help to group my data at multiple levels (currently 4) using a tree structure and perform aggregate calculations at each level. I have the below clojure code to generate a list of test records for me (using the get-rec function). (def reg-cntry-list {America [USA Canada Mexico Venezuala Brazil Argentina Cuba] Asia [India Pakistan Singapore China Japan Sri Lanka Malaysia] Europe [UK Germany France Italy Belgium Turkey Finland] Middle East [Saudi Arabia Bahrain UAE Kuwait Yemen Qatar Iraq] Africa [Libya Tanzania South Africa Kenya Ethiopia Morocco Zimbabwe]}) (def sec-ind-list {Basic Materials [Apparel Auto Part Building Packaged] Consumer Goods [Beveragess Cigarettes Drugs Newspapers] Financial [Life Insurance Banking Investment Funds] Healthcare [Home care Hospitals Plans Medical] Industrial [Chemicals Cleaning Machine Lumber] Services [Advertising Broadcasting Education Publishing] Technology [Biotechnology Computers Data Storage Electronics] Utilities [Farm Products Electric Gas Oil]}) (defn get-rec [] (let [r (rand-nth (keys reg-cntry-list)) s (rand-nth (keys sec-ind-list))] {:sec_id (rand-int 1000) :attr1 r :attr2 (rand-nth (reg-cntry-list r)) :attr3 s :attr4 (rand-nth (sec-ind-list s)) :mv (rand 100) })) ;generate 50 random records (def data (take 50 (repeatedly get-rec))) Each record (map) has the following keys: :sec_id - security id :attr1 - attribute 1 of the security :attr2 - attribute 2 of the security :attr3 - attribute 3 of the security :attr4 - attribute 4 of the security :mv - market value of the security What I need is a tree like data structure that can group my data in four levels (based on attr1, attr2, attr3 and attr4) as follows and also store total market value (total mv) for each level: root (total mv of all recs) |America (total mv for America) | |USA (total mv for America/USA) | |Financial (total mv for America/USA/Financial) | |___Banking (total mv for America/USA/Financial/Banking) | | :sec_id 1 :mv 889 | :sec_id 2 :mv 393 |___Funds (total mv for America/USA/Financial/Funds) | | :sec_id 3 :mv 33 | :sec_id 4 :mv 393 |Technology | |___Electronics | | sec_id 5 :mv 93 | sec_id 6 :mv 29 |___Data Storage | | sec_id 7 :mv 389 | sec_id 8 :mv 93 |Canada | |Industrial | |___Machine | | sec_id 10 :mv 34 | sec_id 11 :mv 93 |___Lumber | | sec_id 12 :mv 93 | sec_id 13 :mv 93 |___Europe (total mv for Europe) | |Germany | |Financial | |___Banking | | sec_id 1 :mv 93 | sec_id 2 :mv 93 |Technology | |___Electronics | | sec_id 5 :mv 93 | sec_id 6 :mv 93 |France | |Industrial | |___Lumber | | sec_id 12 :mv 93 | sec_id 13 :mv 93 I tried the group-by function but can't write it to get the nested data I want and to perform the aggregate computations at each level. I am learning Clojure using this exercise which is part of a bigger project that I am interested in re-writing using Clojure. Any help greatly appreciated. Thanks Shoeb -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this