There are much more fearsome Dataframe wizards than I am on this forum, but 
I would use something like this:

using RDatasets

df = dataset("datasets", "iris")

by(df, :PetalWidth) do subdf
    subdf[indmin(subdf[:SepalLength]), :]
end

although that's keeping the entries with the lowest sum, and discarding 
everything else. Could you clarify what you mean by "I try to find a simple 
way to remove the duplicates that have the lowest sum without changing the 
values of b and c."?

On Friday, May 27, 2016 at 1:56:54 PM UTC-4, Fred wrote:
>
> Hi,
>
> I have a dataframe df2 and the last column is the sum = b+c :
>
> julia> df2
> 8x4 DataFrames.DataFrame
> │ Row │ a │ b │ c         │ sum       │
> ┝━━━━━┿━━━┿━━━┿━━━━━━━━━━━┿━━━━━━━━━━━┥
> │ 1   │ 1 │ 2 │ -0.163564 │ 1.83644   │
> │ 2   │ 2 │ 1 │ 0.731038  │ 1.73104   │
> │ 3   │ 3 │ 2 │ 0.0951149 │ 2.09511   │
> │ 4   │ 4 │ 1 │ 0.195321  │ 1.19532   │
> │ 5   │ 1 │ 2 │ 1.97058   │ 3.97058   │
> │ 6   │ 2 │ 1 │ 0.150826  │ 1.15083   │
> │ 7   │ 3 │ 2 │ 0.422046  │ 2.42205   │
> │ 8   │ 4 │ 1 │ -1.36549  │ -0.365486 │
>
> we can se that the column a have duplicates (1,2,3). I try to find a 
> simple way to remove the duplicates that have the lowest sum without 
> changing the values of b and c.
>
> I tried :
>
> julia> aggregate(df2, :a,  maximum)
>
>
> 4x4 DataFrames.DataFrame
> │ Row │ a │ b_maximum │ c_maximum │ sum_maximum │
> ┝━━━━━┿━━━┿━━━━━━━━━━━┿━━━━━━━━━━━┿━━━━━━━━━━━━━┥
> │ 1   │ 1 │ 2         │ 1.97058   │ 3.97058     │
> │ 2   │ 2 │ 1         │ 0.731038  │ 1.73104     │
> │ 3   │ 3 │ 2         │ 0.422046  │ 2.42205     │
> │ 4   │ 4 │ 1         │ 0.195321  │ 1.19532     │
>
>
>
> but this is wrong because I don't want b_maximum and c_maximum but only 
> sum_maximum :
>
>
> Row │ a │ b │ c │ sum_maximum
>
>
> I don't think that there is a simple way to do that but I ask the question 
> in case ;)
>
> Thank  you very much in advance !
>

Reply via email to