[julia-users] Re: DataFrame : aggregate with only on column possible ?

Fred Sat, 28 May 2016 00:26:18 -0700

Thank you Cedric !

To clarify I give you an example :


│ Row │ a │ b │ c    │ sum │
┝━━━━━┿━━━┿━━━┿━━━━━━┿━━━━━┥
│ 1   │ X │ 2 │ 10   │12   │
│ 2   │ Y │ 1 │ 3    │ 4   │
│ 3   │ Z │ 2 │ 5    │ 7   │
│ 4   │ X │ 1 │ 20   │ 21  │
│ 5   │ X │ 2 │ 5    │ 7   │
│ 6   │ Z │ 1 │ 8    │ 9   │

I want to obtain :

│ Row │ a │ b │ c  │ sum_max│
┝━━━━━┿━━━┿━━━┿━━━━┿━━━━━━━━┥
│ 1   │ X │ 1 │20  │21      │
│ 2   │ Y │ 1 │ 3  │ 4      │
│ 3   │ Z │ 1 │ 8  │ 9      │





 
you can see that the lines are unchanged but filtered to obtain the sum 
maximum. In particular the column b contains only "1". 
with aggregate(df2, :a,  maximum) it is not the case because I would also 
obtain the maximum of b (2,1,2) and c. When I have duplicates in column a 
(X,X,X), for example :


│ Row │ a │ b │ c   │sum_max│
┝━━━━━┿━━━┿━━━┿━━━━━┿━━━━━━━┥
│ 1   │ X │ 2 │ 10  │  12   │
│ 4   │ X │ 1 │ 20  │  21   │
│ 5   │ X │ 2 │ 5   │  7    │




I want to remove the rows  1 and 5 because their sum is lower than row 4
. So the result is :

│ Row │ a │ b │ c  │ sum_max│
┝━━━━━┿━━━┿━━━┿━━━━┿━━━━━━━━┥
│ 4   │ X │ 1 │ 20 │     21 │



I don't wan't b_max and c_max, only sum_max. I hope my explanation is now 
more clear :)

[julia-users] Re: DataFrame : aggregate with only on column possible ?

Reply via email to