Re: Multi-level bucketing problem

2011-05-03 Thread JuanManuel Gimeno Illa
I'm sure this can be simplyfied:

(defn mlg [attrs data]
(if (empty? attrs)
[ (reduce + (map :mv data)) {:children data}]
(let [parts (group-by (first attrs) data)
  subtrees (map (fn [[value data]] 
[value (mlg (rest attrs) (map #(dissoc % (first 
attrs)) data))])
  parts)]
(reduce (fn [[sum tree] [value [sumsubtree subtree]]]
[(+ sum sumsubtree)
 (update-in tree [:children] conj (assoc subtree 
:path [(first attrs) 
value]
:mv sumsubtree))]
)
[ 0.0  { :children [] }]
subtrees

Returns a pair with the sum for all items and a tree. 

Each tree is represented as a dictionary, and inner nodes of the tree have 
three keys:
- :mv the sum of :mv's of its children
- :path a pair of attr-value that represents all the leaves in the subtree
- :children the subtrees of this level

Leaves are represented as dictionaries with only the keys :sec_id and :mv.

Best regards,

Juan Manuel



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: RE: Multi-level bucketing problem

2011-05-03 Thread JuanManuel Gimeno Illa
I'm sure this can be simplyfied:

(defn mlg [attrs data]
(if (empty? attrs)
[ (reduce + (map :mv data)) {:children data}]
 (let [parts (group-by (first attrs) data)
   subtrees (map (fn [[value data]] 
 [value (mlg (rest attrs) (map #(dissoc % (first 
attrs)) data))])
   parts)]
 (reduce (fn [[sum tree] [value [sumsubtree subtree]]]
 [(+ sum sumsubtree)
  (update-in tree [:children] conj (assoc subtree 
 :path [(first 
attrs) value]
 :mv sumsubtree))]
 )
 [ 0.0  { :children [] }]
 subtrees

Returns a pair with the sum for all items and a tree. 

Each tree is represented as a dictionary, and inner nodes of the tree have 
three keys:
- :mv the sum of :mv's of its children
- :path a pair of attr-value that represents all the leaves in the subtree
- :children the subtrees of this level

Leaves are represented as dictionaries with only the keys :sec_id and :mv.

I've forked your gist, so you can grab the code directly from 
github https://gist.github.com/952861

Best regards,

Juan Manuel

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

RE: RE: Multi-level bucketing problem

2011-05-03 Thread Bhinderwala, Shoeb
Whoa! Thanks Juan. I will start to understand/analyze this...

 



From: clojure@googlegroups.com [mailto:clojure@googlegroups.com] On
Behalf Of JuanManuel Gimeno Illa
Sent: Tuesday, May 03, 2011 11:40 AM
To: clojure@googlegroups.com
Subject: Re: RE: Multi-level bucketing problem

 

I'm sure this can be simplyfied:

 

(defn mlg [attrs data]

(if (empty? attrs)

[ (reduce + (map :mv data)) {:children data}]

(let [parts (group-by (first attrs) data)

  subtrees (map (fn [[value data]] 

[value (mlg (rest attrs) (map #(dissoc %
(first attrs)) data))])

  parts)]

(reduce (fn [[sum tree] [value [sumsubtree subtree]]]

[(+ sum sumsubtree)

 (update-in tree [:children] conj (assoc
subtree 

:path
[(first attrs) value]

:mv
sumsubtree))]

)

[ 0.0  { :children [] }]

subtrees

 

Returns a pair with the sum for all items and a tree. 

 

Each tree is represented as a dictionary, and inner nodes of the tree
have three keys:

- :mv the sum of :mv's of its children

- :path a pair of attr-value that represents all the leaves in the
subtree

- :children the subtrees of this level

 

Leaves are represented as dictionaries with only the keys :sec_id and
:mv.

 

I've forked your gist, so you can grab the code directly from github
https://gist.github.com/952861

 

Best regards,

 

Juan Manuel

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with
your first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Multi-level bucketing problem

2011-05-02 Thread Miki
One way is not to use a tree structure but to aggregate by composed keys, 
starting with [:attr1] then [:attr1 :attr2] ...

(defn sum-by [data attrs]
  (let [aggregated (group-by (apply juxt attrs) data)]
(zipmap (keys aggregated) (map #(reduce + (map :mv %)) (vals 
aggregated)

(println (sum-by data [:attr1 :attr2]))

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Multi-level bucketing problem

2011-05-02 Thread Base
Google Collections has a multimap that could help you with this.  It
is pretty cool.

As a note they actually push immutability pretty hard - maybe Rich got
to them?

http://guava-libraries.googlecode.com/svn/tags/release09/javadoc/com/google/common/collect/Multimaps.html



On May 1, 10:57 pm, Bhinderwala, Shoeb
sabhinderw...@wellington.com wrote:
 Hi fellow clojurers:

 I need help to group my data at multiple levels (currently 4) using a
 tree structure and perform aggregate calculations at each level.

 I have the below clojure code to generate a list of test records for me
 (using the get-rec function).

 (def reg-cntry-list
   {America [USA Canada Mexico Venezuala Brazil Argentina
 Cuba]
    Asia [India Pakistan Singapore China Japan Sri Lanka
 Malaysia]
    Europe [UK Germany France Italy Belgium Turkey
 Finland]
    Middle East [Saudi Arabia Bahrain UAE Kuwait Yemen
 Qatar Iraq]
    Africa [Libya Tanzania South Africa Kenya Ethiopia
 Morocco Zimbabwe]})

 (def sec-ind-list
   {Basic Materials [Apparel Auto Part Building Packaged]
    Consumer Goods [Beveragess Cigarettes Drugs Newspapers]
    Financial [Life Insurance Banking Investment Funds]
    Healthcare [Home care Hospitals Plans Medical]
    Industrial [Chemicals Cleaning Machine Lumber]
    Services [Advertising Broadcasting Education Publishing]
    Technology [Biotechnology Computers Data Storage
 Electronics]
    Utilities [Farm Products Electric Gas Oil]})

  (defn get-rec []
   (let
     [r (rand-nth (keys reg-cntry-list))
      s (rand-nth (keys sec-ind-list))]
     {:sec_id (rand-int 1000)
      :attr1  r
      :attr2  (rand-nth (reg-cntry-list r))
      :attr3  s
      :attr4  (rand-nth (sec-ind-list s))
      :mv  (rand 100)
     }))

 ;generate 50 random records
 (def data (take 50 (repeatedly get-rec)))

 Each record (map) has the following keys:
   :sec_id - security id
   :attr1 - attribute 1 of the security
   :attr2 - attribute 2 of the security
   :attr3 - attribute 3 of the security
   :attr4 - attribute 4 of the security
   :mv - market value of the security

 What I need is a tree like data structure that can group my data in four
 levels (based on attr1, attr2, attr3 and attr4) as follows and also
 store total market value (total mv) for each level:

   root (total mv of all recs)
     |America (total mv for America)
            |
            |USA (total mv for America/USA)
                  |
                  |Financial (total mv for America/USA/Financial)
                          |
                          |___Banking (total mv for
 America/USA/Financial/Banking)
                                 |
                                 | :sec_id 1 :mv 889
                                 | :sec_id 2 :mv 393
                          |___Funds (total mv for
 America/USA/Financial/Funds)
                                 |
                                 | :sec_id 3 :mv 33
                                 | :sec_id 4 :mv 393

                  |Technology
                          |
                          |___Electronics
                                 |
                                 | sec_id 5 :mv 93
                                 | sec_id 6 :mv 29
                          |___Data Storage
                                 |
                                 | sec_id 7 :mv 389
                                 | sec_id 8 :mv 93
            |Canada
                  |
                  |Industrial
                          |
                          |___Machine
                                 |
                                 | sec_id 10 :mv 34
                                 | sec_id 11 :mv 93
                          |___Lumber
                                 |
                                 | sec_id 12 :mv 93
                                 | sec_id 13 :mv 93
   |___Europe (total mv for Europe)
            |
            |Germany
                  |
                  |Financial
                          |
                          |___Banking
                                 |
                                 | sec_id 1 :mv 93
                                 | sec_id 2 :mv 93

                  |Technology
                          |
                          |___Electronics
                                 |
                                 | sec_id 5 :mv 93
                                 | sec_id 6 :mv 93
            |France
                  |
                  |Industrial
                          |
                          |___Lumber
                                 |
                                 | sec_id 12 :mv 93
                                 | sec_id 13 :mv 93

 I tried the group-by function but can't write it to get the nested data
 I want and to perform the aggregate computations at each level.

 I am learning Clojure using this exercise which is part of a bigger
 

RE: Multi-level bucketing problem

2011-05-02 Thread Bhinderwala, Shoeb
Hi Miki

 

What you provided is an amazing piece of code. I need a lot more time to
understand how it works since it uses so many higher order functions.

 

I have uploaded my original problem code base and your solution at:

 

   https://gist.github.com/952382

 

This is just a simplified version of my problem. What we are trying to
do is group data in hierarchies and then performs all sorts of
calculations on each grouping at each level. The calculations at the
lower grouping levels also roll up and feed calculations at the higher
levels.

 

Your idea of not creating a tree and simply using functions to compute
the values for the groups (nodes) is certainly very thought provoking. 

 

Thanks

Shoeb

 



From: clojure@googlegroups.com [mailto:clojure@googlegroups.com] On
Behalf Of Miki
Sent: Monday, May 02, 2011 11:33 AM
To: clojure@googlegroups.com
Subject: Re: Multi-level bucketing problem

 

One way is not to use a tree structure but to aggregate by composed
keys, starting with [:attr1] then [:attr1 :attr2] ...

(defn sum-by [data attrs]
  (let [aggregated (group-by (apply juxt attrs) data)]
(zipmap (keys aggregated) (map #(reduce + (map :mv %)) (vals
aggregated)

(println (sum-by data [:attr1 :attr2]))

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with
your first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Multi-level bucketing problem

2011-05-01 Thread Bhinderwala, Shoeb
Hi fellow clojurers:

I need help to group my data at multiple levels (currently 4) using a
tree structure and perform aggregate calculations at each level.

I have the below clojure code to generate a list of test records for me
(using the get-rec function). 

(def reg-cntry-list
  {America [USA Canada Mexico Venezuala Brazil Argentina
Cuba]
   Asia [India Pakistan Singapore China Japan Sri Lanka
Malaysia]
   Europe [UK Germany France Italy Belgium Turkey
Finland]
   Middle East [Saudi Arabia Bahrain UAE Kuwait Yemen
Qatar Iraq]
   Africa [Libya Tanzania South Africa Kenya Ethiopia
Morocco Zimbabwe]})

(def sec-ind-list 
  {Basic Materials [Apparel Auto Part Building Packaged]
   Consumer Goods [Beveragess Cigarettes Drugs Newspapers]
   Financial [Life Insurance Banking Investment Funds]
   Healthcare [Home care Hospitals Plans Medical]
   Industrial [Chemicals Cleaning Machine Lumber]
   Services [Advertising Broadcasting Education Publishing]
   Technology [Biotechnology Computers Data Storage
Electronics]
   Utilities [Farm Products Electric Gas Oil]})

 (defn get-rec []
  (let 
[r (rand-nth (keys reg-cntry-list))
 s (rand-nth (keys sec-ind-list))]
{:sec_id (rand-int 1000) 
 :attr1  r
 :attr2  (rand-nth (reg-cntry-list r))
 :attr3  s
 :attr4  (rand-nth (sec-ind-list s))
 :mv  (rand 100)
}))

;generate 50 random records
(def data (take 50 (repeatedly get-rec)))
  
Each record (map) has the following keys:
  :sec_id - security id
  :attr1 - attribute 1 of the security
  :attr2 - attribute 2 of the security
  :attr3 - attribute 3 of the security
  :attr4 - attribute 4 of the security
  :mv - market value of the security

What I need is a tree like data structure that can group my data in four
levels (based on attr1, attr2, attr3 and attr4) as follows and also
store total market value (total mv) for each level:

  root (total mv of all recs)
|America (total mv for America)
   |
   |USA (total mv for America/USA)
 |
 |Financial (total mv for America/USA/Financial)
 |
 |___Banking (total mv for
America/USA/Financial/Banking)
|
| :sec_id 1 :mv 889
| :sec_id 2 :mv 393
 |___Funds (total mv for
America/USA/Financial/Funds)
|
| :sec_id 3 :mv 33
| :sec_id 4 :mv 393

 |Technology
 |
 |___Electronics
|
| sec_id 5 :mv 93
| sec_id 6 :mv 29
 |___Data Storage
|
| sec_id 7 :mv 389
| sec_id 8 :mv 93
   |Canada
 |
 |Industrial
 |
 |___Machine
|
| sec_id 10 :mv 34
| sec_id 11 :mv 93
 |___Lumber
|
| sec_id 12 :mv 93
| sec_id 13 :mv 93
  |___Europe (total mv for Europe)
   |
   |Germany
 |
 |Financial
 |
 |___Banking
|
| sec_id 1 :mv 93
| sec_id 2 :mv 93

 |Technology
 |
 |___Electronics
|
| sec_id 5 :mv 93
| sec_id 6 :mv 93
   |France
 |
 |Industrial
 |
 |___Lumber
|
| sec_id 12 :mv 93
| sec_id 13 :mv 93

  
I tried the group-by function but can't write it to get the nested data
I want and to perform the aggregate computations at each level.

I am learning Clojure using this exercise which is part of a bigger
project that I am interested in re-writing using Clojure.

Any help greatly appreciated.

Thanks
Shoeb


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this