There are much more fearsome Dataframe wizards than I am on this forum, but I would use something like this:
using RDatasets df = dataset("datasets", "iris") by(df, :PetalWidth) do subdf subdf[indmin(subdf[:SepalLength]), :] end although that's keeping the entries with the lowest sum, and discarding everything else. Could you clarify what you mean by "I try to find a simple way to remove the duplicates that have the lowest sum without changing the values of b and c."? On Friday, May 27, 2016 at 1:56:54 PM UTC-4, Fred wrote: > > Hi, > > I have a dataframe df2 and the last column is the sum = b+c : > > julia> df2 > 8x4 DataFrames.DataFrame > │ Row │ a │ b │ c │ sum │ > ┝━━━━━┿━━━┿━━━┿━━━━━━━━━━━┿━━━━━━━━━━━┥ > │ 1 │ 1 │ 2 │ -0.163564 │ 1.83644 │ > │ 2 │ 2 │ 1 │ 0.731038 │ 1.73104 │ > │ 3 │ 3 │ 2 │ 0.0951149 │ 2.09511 │ > │ 4 │ 4 │ 1 │ 0.195321 │ 1.19532 │ > │ 5 │ 1 │ 2 │ 1.97058 │ 3.97058 │ > │ 6 │ 2 │ 1 │ 0.150826 │ 1.15083 │ > │ 7 │ 3 │ 2 │ 0.422046 │ 2.42205 │ > │ 8 │ 4 │ 1 │ -1.36549 │ -0.365486 │ > > we can se that the column a have duplicates (1,2,3). I try to find a > simple way to remove the duplicates that have the lowest sum without > changing the values of b and c. > > I tried : > > julia> aggregate(df2, :a, maximum) > > > 4x4 DataFrames.DataFrame > │ Row │ a │ b_maximum │ c_maximum │ sum_maximum │ > ┝━━━━━┿━━━┿━━━━━━━━━━━┿━━━━━━━━━━━┿━━━━━━━━━━━━━┥ > │ 1 │ 1 │ 2 │ 1.97058 │ 3.97058 │ > │ 2 │ 2 │ 1 │ 0.731038 │ 1.73104 │ > │ 3 │ 3 │ 2 │ 0.422046 │ 2.42205 │ > │ 4 │ 4 │ 1 │ 0.195321 │ 1.19532 │ > > > > but this is wrong because I don't want b_maximum and c_maximum but only > sum_maximum : > > > Row │ a │ b │ c │ sum_maximum > > > I don't think that there is a simple way to do that but I ask the question > in case ;) > > Thank you very much in advance ! >